Open nfriedly opened 4 years ago
I have the same problem. I've seen yesterdays MR trying to reduce the memory usage but I don't think that approach is going to work. I think the hashes need to be stored somewhere else until it's time for comparison.
A MD5-Hash is 128 bit. Asuming we store nothing but the hash in memory that's 64 hashes/files per MB and 'only' 33k Files for half a gig where most PHP-Installations will throw the error we see above. Since we're also storing other stuff in the arrays and have other overhead the actual number is probably half that.
I'd love to use this app but I'm working with closer to 1 Million files, not a couple of thousand.
We could save the hashes in the DB, in a dedicated table. Possibly use the oc_filecache primary key as foreign key and use the data from there in conjunction with the events \OCP\Files\Events* (because oc_filecache may contain already deleted files). That way the duplicate detection would be always up to date, no need to wait for it ever.
I think yesterday's changes are still an improvement, and FWIW, it looks like it doesn't hash every file, only ones where there are multiple files of the same file size.
But, your point still stands. Trying to store everything in memory is bound to hit the limit at some point, even with aggressive memory optimizations.
On the other hand, my server has 16GB of RAM, and I'd be perfectly happy to let this thing use the majority of it for a day or two while it churns through the filesystem. That coupled with some memory optimizations might actually be good enough.
I don't know about the technicalities, but consider this as a quick work-around to get results.
Create a new (nextcloud) user.
(use occ: files:transfer-ownership ) Transfer ownership of a smaller chunk of your data to be compared to this new user, login as this user and run the app.
When cleaned, transfer ownership back to the original user.
Repeat this for the next chunk.
Yes, I know, this omits the doubles between the different chunks, but I found that it was very predictable to determine the different areas where the duplicates would be.
Hope this helps somebody!
I solved this problem by increasing the PHP memory limit specified inside the "memory-limit.ini" file (I don't remember the file's path and also note that I'm using a Docker container, if you're using a different setup maybe you have to edit php.ini instead). But the application should signal this problem via the web UI, otherwise you keep waiting for it to complete the scan while it will never end.
I solved this problem by increasing the PHP memory limit specified inside the "memory-limit.ini" file (I don't remember the file's path and also note that I'm using a Docker container, if you're using a different setup maybe you have to edit php.ini instead). But the application should signal this problem via the web UI, otherwise you keep waiting for it to complete the scan while it will never end.
What did you raise the limit too? Did you double it?
I found this issue due to running into this myself. Trying to find a workaround.
was able to pass the 504 time-out error with those config in my reverse proxy .conf files /etc/nginx/conf.d/ create custom_proxy_settings.conf and add those setting (not sur with one fix the problem)
client_max_body_size 10g;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
I was hoping PR https://github.com/PaulLereverend/NextcloudDuplicateFinder/pull/25 would help this. But it didn't :( Mine still errors out even /w CLI.
root@93678e3fc2a1:/# occ duplicates:find-all
Start scan... user: *redacted*
No duplicate file
...end scan
Start scan... user: *redacted*
No duplicate file
...end scan
Start scan... user: *redacted*
No duplicate file
...end scan
Start scan... user: *redacted*
No duplicate file
...end scan
Start scan... user: *redacted*
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 4096 bytes) in /config/www/nextcloud/lib/private/Files/View.php on line 185
PR #25 was reverted - https://github.com/PaulLereverend/NextcloudDuplicateFinder/commit/c020135d573e309ff791940c36be6adf6f2b20f8
I expect it will resolve the issue whenever it lands "for real"
I'd love to use this app, but I'm having trouble, possibly related to having a quite large amount of data.
My setup is an Unraid server running the NextCloud docker image from linuxserver.io. Docker and NextCloud, as well as my user data (only a few megabytes) are all on an SSD, but I have the External Storage app configured with a local share from unraid that is ~25TB.
When I go to the web UI, I just get the spinner forever, like in #1. If I open the browser's dev tools I can see that there's a request to
/apps/duplicatefinder/files
that gets a 504 Gateway Timeout failure from nginx/1.18.0 after a minute or so.So, I tried opening a shell in the docker image and running the
occ duplicates:find-all
(after figuring out the correct prefix from #2)That took a few minutes and then failed like so:
(536870912 bytes is about half a gigabyte)
While it was running, I didn't hear any hard drives spin up, although the CPU was pegged to basically 100%. Then I noticed it was still pegged after the command failed, with two instances of
php7 -f /config/www/nextcloud/cron.php
running at 50% load each.I'm not really sure if cron.php was related to duplicate finder, but it seems plausible.
I restarted the docker instance and ran
occ duplicates:find-all
- this time the CPU load stayed lower, around 30-40%, and there was some disk activity, but it still ended with the same out of memory error as above. This time, the CPU load returned to near-0 when it finished.I'm fairly new to NextCloud, but if there's anything you'd like me to try or logs you'd like to see, please let me know.