Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
630 stars 57 forks source link

Beesd start to consume a lot of CPU and generate of never ending FIXME traces #204

Closed JaviVilarroig closed 2 years ago

JaviVilarroig commented 2 years ago

I have a configuration with 2x1Tb SSD for normal use plus a 2x1Tb HD for internal backup, bot in raid1 configuration.

Recently bees started to take all the CPU available. Looking at syslog I have found that I'm getting tons of this

nov 19 23:07:28 gondor beesd[104752]: crawl_1630[104774]: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack
nov 19 23:07:28 gondor beesd[104752]: crawl_1632[104775]: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack
nov 19 23:07:28 gondor beesd[104752]: crawl_1632[104777]: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack
nov 19 23:07:28 gondor beesd[104752]: crawl_1634[104776]: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack
nov 19 23:07:28 gondor beesd[104752]: crawl_1634[104776]: exception (ignored): exception type std::runtime_error: FIXME: bailing out here, need to fix this further up the call stack

Apparently it's the only thing that bees is doing. From the moment I start it to the moment I stop it.

I have tried to balance and scrub but no changes.

I know this is not enough information. So the first question probably is. How can I produce more informative logs?

Thanks.

Zygo commented 2 years ago

The exceptions mean that there are too many matching block candidates, and bees is spending too much CPU trying to choose from multiple possible matches for an extent. It means you have some data that is extremely repetitive (like all exactly the same block repeated over and over in the same extent).

Eventually bees will get past that section of the data and behave more normally (unless all of your data is like this).

At the moment there is no simple fix. Avoiding this issue is on my list of things to do when rewriting the block matching loop. In the meantime you might want to limit the number of threads with -c or lower the CPU priority with nice or systemd options.

JaviVilarroig commented 2 years ago

Thanks for the fast answer. Being a btrfs volume that is dedicated to backups, make a lot of sense. I will look at the resource limiting options. Thanks!