biigle / user-storage

:m: BIIGLE module to offer file upload and storage for users
GNU General Public License v3.0
0 stars 0 forks source link

Job batching and limit for requests with many files #19

Open mzur opened 9 months ago

mzur commented 9 months ago

We should implement a limit to the maximum number of files per storage request. On biigle.de, we have a user who created several requests with multiple 10k or even more than 100k images. Maybe we could implement a 10k limit. If users want to upload more, they have to chunk the files into several storage requests.

One issue that too many files can cause is too long run times for the queue jobs. Also they can theoretically spam the service with millions of small files (as long as the total size is within their quota but that's easy).

dlangenk commented 9 months ago

Wouldn't it make more sense to chunk the storage request into multiple queue jobs instead of limiting it on the user site? In the end both solutions would result in the same outcome (if the user submits multiple requests in your case), but having similar data as one storage request seems more manageable.

mzur commented 9 months ago

That's also a good idea! However, I think we need some kind of limit in any case. Maybe a higher one, then (100k, 500k)?

The ApproveStorageRequest job can be split up into several smaller jobs. The copying can be done as batched jobs (each copies 10k files) then the user is notified and then the pending directory is deleted.