BobbyWibowo / lolisafe

Blazing fast file uploader and awesome bunker written in node! 🚀
MIT License
317 stars 56 forks source link

[FEATURE REQUEST] Duplicate Checking #189

Closed ghost closed 4 years ago

ghost commented 4 years ago

Is your feature request related to a problem? Please describe.

There are often a lot of duplicates uploaded and take up unnecessary space.

Describe the solution you'd like

Check for duplicates (ideally before) upload.

Additional context

This might be a completely wrong approach but how I would implement something like this is hash the file in the browser before the upload starts, check that against server log, if it matches, return the servers already stored file, if not continue with the upload.

Possible limitations with this approach would be how large files browsers can hash. afaik chrome couldn't do over 512mib a couple years ago although this may have changed.

Ideally, all processing would be done in the browser both to save bandwidth and CPU time on server.

This is something I have very poorly implemented on my own instance (fully server-sided) but I plan to experiment with doing it in the browser. so may be able to make a or once I have more time

BobbyWibowo commented 4 years ago

Isn't that this? https://github.com/BobbyWibowo/lolisafe/blob/58cbcdd1fe0853b3379ef06993a87dee91629855/controllers/uploadController.js#L614

Granted I don't think that's effective at all if there's at least 1 byte difference.

BobbyWibowo commented 4 years ago

Also something client-sided doesn't sound particularly effective for this, unless you want to explicitly disable uploads from other sources other than the homepage uploader.

People usually use pomf-based hosts to share screenshots, and those people would already have their own clients of some sort, me included. Typically ShareX, and on my case, my own fork of uguush.

Frankly that's why I made additional features such as album ID, identifier length, upload age, etc. configurable through HTTP headers. To give such clients the freedom of configuring them as well.

ghost commented 4 years ago

I haven't done a full update in forever. Didn't realize this had been implemented. You're right, I haden't considered other sources. Implementing something to check for similar files would be hard and most likely error-prone.

BobbyWibowo commented 4 years ago

Yeah.

On that matter, adding is:duplicate for the uploads list filter is on my backlog as well. It'll basically just group files by their hashes, and return ones that have more than 1 file in a group. But unlike the existing duplicates avoidance, this won't care about which accounts that uploaded them.

Will probably be useful if there are bad people that upload the same files with multiple accounts or whatever.