Open Timarrr opened 1 year ago
Update: Free size now reports around 300GiB, but I needed to increase the DB size to 12 GiB so as to avoid it overfilling. Also I found out that bees performs WAY better with one thread in my situation: worst case with very frequent seeks it still sits @3-4MB/s but now it sometimes goes to 100 something MB/s. Also with one thread it doesn't load the system nearly as much (i.e. with default settings all my cores were busy with I/O waiting and system was ~12 load avg, but now it's only 1-2.) and the HDD doesn't heat up as much
I'm not sure if the DB overfilling is really such a big issue. In the end, it's okay to push out older hashes and keep the hashes for big blocks, and you don't want to have too many shared extents per hash anyways. Thus you probably don't want to keep hashes for small blocks because that's like taking 99% time for 1% space savings.
Also, the problem with multiple threads is rather lock contention in btrfs. But I'm not sure if bees does some seek optimizing by re-ordering queued jobs, so seeking may be an issue, too.
What you observe for space is a documented behavior of bees, especially when coming from other dedup programs: Before freeing space, used space fills up or free space stops growing until the effort of bees finally resolves into freeing all the extents with the final snapshot sharing it.
I had half a terabyte left on my 4TB HDD and wanted to dedupe it to increase available size. After running bees for over 36 hours,
btrfs filesystem usage -h /hdd
reportsFree (estimated): 161.13GiB
. Bees is still buzzing along and my free space has stopped shrinking around this point. Also I have to mention that the 4GB hash table started overfilling and i had to restart beesd with 8GB db size in config.Another thing is that bees seem to spam the
2023-09-05 02:11:33 513194.513219<7> crawl_5_680152: exception (ignored): exception type std::runtime_error: FIXME: too many duplicate candidates, bailing out here
thing, sometimes for 15 seconds straight. Is this bad?