esmero / archipelago-deployment

Archipelago Commons Docker Deployment Repository
33 stars 16 forks source link

Large File uploads via Drupal and NGINX, multipart and improvements for 1.0.0-RC1 #70

Open DiegoPino opened 4 years ago

DiegoPino commented 4 years ago

Everybody loves uploading 1Gbyte+ via a webform

Ok, not everybody but I just found myself doing it and it was not fun.

Changes we need to make this happen (this is a mix of documentation but also testing new things)


- Webforms need their elements updated. This may be needed for Webarchives, Datasets and Video

We may want multipart upload too. With resuming capabilities: there are a number of JS libraries that can handle this but I also like the nginx native approach, here:
- https://www.nginx.com/resources/wiki/modules/upload_progress/ which exposes a special JSON endpoint that a webform element could, via JQUERY consult in certain intervals and display realtime info on uploads
- http://www.grid.net.ru/nginx/upload.en.html which allows multi part/resumable uploads (chunks)

@giancarlobi you use Apache I know, but this is needed so maybe you have backend independent suggestions we may want to add

Also:

All this implies changes to our docker deployment. Right now we are using the generic Nginx docker container but also, for SSL I moved to staticfloat/nginx-certbot which includes not only base system but also automatic renewal and request of SSL certs (free ones, fully valid) and non interactive which means just a few more seconds startup time but no copying .well-known hashes around and sending prayers. I feel if we are going this route, we may need to extend our own Docker Container for nginx.
dmer commented 4 years ago

Not sure I think it's fun, but I know that sometimes folks will need to use webform for larger files so I'm glad you're thinking about it. Very pleased re: the new nginx container as I have found fighting w/ certbot to be even less fun than uploading large files via webform. Exciting to see that 1.0 branch on archipelago-deployment!

DiegoPino commented 4 years ago

Once i share the the new docker-compose file here @dmer all will be revealed. Stay tuned. the Nginx container is actually simply, just SUPER strict but simple. I'm still getting today JS errors when uploading 1 Gbytes of files but its no longer PHP nor the web server (which is ok). But hope that with the new extra modules we can have finally reliable multi part uploads. My real hopes are really put on the voucher thing really.

Also šŸ‘€ https://github.com/esmero/archipelago-docker-images/pull/20 Natural language processing is already up in docker hub!

DiegoPino commented 4 years ago

Update: Gigabyte files uploaded via Webform worked out. Just running and šŸ‘Øā€šŸŒ¾ and šŸ¤¦ making sure The Docker Daemon was not running out of space. Gosh. Because NGINX is buffering but still uses the containers very own /tmp which funny was using tiny slim boot drive. Will work now on the full progress and multipart solution.

DiegoPino commented 3 years ago

Some updates. This is not really hard.

Finally: Also, we should allow every Upload to come with a list/field/option of providing a known Checksum + algorithm + Base (base64?) so we can compare once upload is done. Why? Many times, I have been there, you actually have a bad file on disk before starting the whole thing. And also, via chunked/multipart you NEED to know what you are uploading so you can validate once all is done. Right?

Adios