Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

some CMD line options are not recognized #144

Open vendetta1987 opened 4 years ago

vendetta1987 commented 4 years ago

After setting up bees I wanted to reduce the log level as well as try out different scan modes. Having it set up as a systemd service I tried and just added --verbose 6 and --scan-mode 2 to the unit file. The service wouldn't start anymore so I tried the same by calling beesd directly which also didn't work. All I got was an exception about invalid parameters in some call to std::stoul. After fiddling around with it I finally made it work using the OPTIONS field in the configuration file. This is a bit puzzling as I initially had --strip-paths --no-timestamps seemingly working when added to the unit file. So flags without parameters seem to work. I compiled bees 0.6.2 downloaded from the release section myself without further changes and am running it on Linux helios4 4.19.104-mvebu #20.02.1 SMP Mon Feb 17 03:22:46 CET 2020 armv7l GNU/Linux a SBC running armbian.

kakra commented 4 years ago

Maybe the order of parameters is important? There's some fixes in #136 which may help your problem.

vendetta1987 commented 4 years ago

That may fix the timely issue of applying parameters but in my case some of them cause an exception during startup. The timestamp option was working correctly for me though. I also tried different orders in my command line to no avail.

Am 18. April 2020 13:38:39 MESZ schrieb Kai Krakow notifications@github.com:

Maybe the order of parameters is important? There's some fixes in #136 which may help your problem.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/Zygo/bees/issues/144#issuecomment-615848552

kakra commented 4 years ago

Did you try the patches? Actually it may fix your problem because some cmdline options may access partially uninitialized objects in bees without this patch. Usually that's only a problem on early shutdown, i.e. because of errors. A full stacktrace would help. Try coredumpctl dump bees if you're using systemd and after the daemon crashed.

kakra commented 4 years ago

BTW: There may be a problem with 0.6.2, it has a bug somewhere with cmdline parameters. Could you try 0.6.1, or preferably master which should become 0.7 at some time when @Zygo no longer fears the tag ;-). Actually, I think the 0.6.2 tag shouldn't have been made anyway, it seems to miss one or two important patches around the cmdline parser.

See-also: https://github.com/Zygo/bees/issues/132

vendetta1987 commented 4 years ago

These are the outputs I get when using --verbose and --set-mode.

sudo beesd --no-timestamps --strip-paths --verbose 6 564a5a5f-9db3-45af-9a11-ba1805a4b427 INFO: Find 564a5a5f-9db3-45af-9a11-ba1805a4b427 in /etc/bees//sharedfolders_main.conf, use as conf INFO: Check: Disk exists INFO: Check: Disk with btrfs INFO: WORK DIR: /run/bees/ INFO: MOUNT DIR: /run/bees//mnt/564a5a5f-9db3-45af-9a11-ba1805a4b427 bees version 0.6.2 2020-04-18 19:10:20 30365.30365<3> bees: 2020-04-18 19:10:20 30365.30365<3> bees: 2020-04-18 19:10:20 30365.30365<3> bees: EXCEPTION 2020-04-18 19:10:20 30365.30365<3> bees: exception type std::invalid_argument: stoul 2020-04-18 19:10:20 30365.30365<3> bees: *** terminate called after throwing an instance of 'std::invalid_argument' what(): stoul /usr/sbin/beesd: line 145: 30365 Aborted "$bees_bin" "${ARGUMENTS[@]}" $OPTIONS "$MNT_DIR"

sudo beesd --no-timestamps --strip-paths --scan-mode 2 564a5a5f-9db3-45af-9a11-ba1805a4b427 INFO: Find 564a5a5f-9db3-45af-9a11-ba1805a4b427 in /etc/bees//sharedfolders_main.conf, use as conf INFO: Check: Disk exists INFO: Check: Disk with btrfs INFO: WORK DIR: /run/bees/ INFO: MOUNT DIR: /run/bees//mnt/564a5a5f-9db3-45af-9a11-ba1805a4b427 bees version 0.6.2 2020-04-18 19:10:53 30402.30402<6> bees: BeesStringFile .beeshome/beescrawl.dat max size 1M 2020-04-18 19:10:53 30402.30402<7> bees: BeesThread exec crawl_transid 2020-04-18 19:10:53 30402.30403<7> crawl_transid: Starting thread crawl_transid 2020-04-18 19:10:53 30402.30402<3> bees: 2020-04-18 19:10:53 30402.30402<3> bees: 2020-04-18 19:10:53 30402.30402<3> bees: EXCEPTION 2020-04-18 19:10:53 30402.30402<3> bees: exception type std::invalid_argument: stoul 2020-04-18 19:10:53 30402.30402<3> bees: *** terminate called after throwing an instance of 'std::invalid_argument' what(): stoul /usr/sbin/beesd: line 145: 30402 Aborted "$bees_bin" "${ARGUMENTS[@]}" $OPTIONS "$MNT_DIR"

vendetta1987 commented 4 years ago

I also checked out master, compiled and tried to make it run. Unfortunately it is throwing an exception even when I'm just trying to show --help. I'm not sure if I want to try and actually access my data:

sudo beesd Usage: beesd [options]


bees version 07e5e7b 2020-04-18 20:03:02 1404.1404<7> bees: Masking signals 2020-04-18 20:03:02 1404.1404<7> bees: BeesThread exec progress_report 2020-04-18 20:03:02 1404.1404<7> bees: BeesThread exec status_report Usage: /usr/lib/bees/bees [options] fs-root-path [fs-root-path-2...] Performs best-effort extent-same deduplication on btrfs.

fs-root-path MUST be the root of a btrfs filesystem tree (id 5). Other directories will be rejected.

Options: -h, --help Show this help

Load management options: -c, --thread-count Worker thread count (default CPU count * factor) -C, --thread-factor Worker thread factor (default 1) -G, --thread-min Minimum worker thread count (default 0) -g, --loadavg-target Target load average for worker threads (default none)

Filesystem tree traversal options: -m, --scan-mode Scanning mode (0..2, default 0)

Workarounds: -a, --workaround-btrfs-send Workaround for btrfs send

Logging options: -t, --timestamps Show timestamps in log output (default) -T, --no-timestamps Omit timestamps in log output -p, --absolute-paths Show absolute paths (default) -P, --strip-paths Strip $CWD from beginning of all paths in the log -v, --verbose Set maximum log level (0..8, default 8)

Optional environment variables: BEESHOME Path to hash table and configuration files (default is .beeshome/ in the root of each filesystem).

BEESSTATUS  File to write status to (tmpfs recommended, e.g. /run).
            No status is written if this variable is unset.

2020-04-18 20:03:02 1404.1404<7> bees: BeesThread destructor status_report 2020-04-18 20:03:02 1404.1404<7> bees: Cancelling thread status_report 2020-04-18 20:03:02 1404.1404<7> bees: Waiting for thread status_report terminate called without an active exception Aborted

kakra commented 4 years ago

Your data is safe at any point (except for kernel bugs): bees always uses the kernel to deduplicate extents, this is transaction safe, concurrent writes won't damage any data, bees never writes into your files (it only uses existing extents which it will create on demand). And yes, master is missing my "--help" fix thus the crash.

The "invalid argument" exception actually doesn't refer to cmdline arguments (tho it may result from reading a cmdline option but I don't think that's the problem here). You may want to delete beescrawl.dat, it seems it may have some incompatible content. This will result in bees starting over from the beginning (but it still has your hashes) so be prepared for some higher IO during the next hours. Besides that there's no harm in deleting the file.

Maybe post the contents of beescrawl.dat, @Zygo may have some insights what's wrong and may suggest a fix in the source code to work around this.

Zygo commented 4 years ago

Crashes near thread termination are usually some kind of libc dependency bug. A number of users have reported that, starting roughly around the last Debian release. The fix seems to be to track down everything the C compiler touches during a rebuild, and reinstall/upgrade it. The stack traces usually point deep into the libc stdio code which bees can do nothing about (I'm guessing this is the case here, but no stack trace was provided).

I suspect the exception in stoul is because beesd passed something to bees that wasn't an integer. Try running 'sh -x beesd ...' to see the final command line passed to bees. There's no exception handler set up during argument passing, so the process just dies. There probably should be an exception handler there just so we can dump out the bees argument list for people to send in github issues.

kakra commented 4 years ago

@Zygo but for the "--help" case this looks a lot like what I fixed with https://github.com/Zygo/bees/pull/136...

# bees --help
# [...]
2020-04-19 00:11:17 1166966.1166968<7> status_report: Exiting thread status_report, 0.001 sec
2020-04-19 00:11:17 1166966.1166966<7> bees: BeesThread destructor status_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Cancelling thread status_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Waiting for thread status_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Waited for status_report, 0.001 sec
2020-04-19 00:11:17 1166966.1166966<7> bees: BeesThread destructor progress_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Cancelling thread progress_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Waiting for thread progress_report
2020-04-19 00:11:17 1166966.1166966<7> bees: Waited for progress_report, 0.001 sec
2020-04-19 00:11:17 1166966.1166966<5> bees: Exiting with status 1 (failure)

It starts all those threads before writing the help report to the terminal, then exits with an error (as it should, it's in the switch/case). The observed exception probably happens somewhere inside those threads. My patch avoids starting those threads early. It doesn't fix the actual exception, tho, which could still happen in the non-help-case.

About the other reported crash from https://github.com/Zygo/bees/issues/144#issuecomment-615906054: According to the log, beescrawl.dat was read just before the exception. I suspect something is wrong in this file. I observed on multiple systems that this file may contain either bogus huge or unrelated numbers. I'm not sure when this happens, but I observed the wrong numbers mostly creeping in when starting bees during boot: On my systems the btrfs root is auto-mounted and may not be ready at this stage, I suspect bees was reading xfs in that case which has some support for those IOCTLs. It errored out then, counting some bug numbers up in the stats and complaining about "file not found" during IOCTL. Afterwards, the beescrawl.dat seems damaged and results in the crash. Purging the beescrawl.dat repeatedly fixed the crash. Meanwhile, I alleviated the problem by mounting my btrfs-pool on boot instead of setting up an auto-mount.

Zygo commented 4 years ago

beescrawl.dat is read from a catch_all a few levels up the stack. Exceptions thrown there will be handled. It's designed to handle a beescrawl.dat that contains unparsable garbage, and tested on a few very badly broken filesystems.

In this case it's a bit different, we crash while trying to join a thread that we just cancelled (so we're running BeesContext's shutdown code). Cancelling a thread is dubious at the best of times: it only works if you never call a native C function (we do, e.g. from_hex), your C++ runtime has fully integrated stdio and thread and internally throws an exception on cancel, and the C++ runtime is totally bug-free (armbian is involved, so I suspect it's not). We can (and did, on master) remove all the thread cancels but at that point the fix is "run master instead of 0.6".

On the other hand, this is all distracting us from the reported bug, which is something about beesd giving bees a bogus parameter. Specifically:

sh -x scripts/beesd beesd --no-timestamps --strip-paths --scan-mode 2 21806bec-39eb-4ef6-8258-fe1c9c9f0310
[...]
+ bees --no-timestamps --strip-paths --scan-mode ''

beesd is failing to include the numeric argument for --scan-mode, causing an integer parse exception.

Zygo commented 4 years ago

OK, technically what we did in master is make sure the threads have all exited before calling the thread destructors. In C++, invoking a destructor on an unjoined thread is not permitted, and if the C++ runtime detects that, it will call terminate and kill the whole program. bees tries to cancel and join the thread during a shutdown, so that it can run the thread destructors without dying; however, this is a workaround of last resort. The C++ runtime often ends up calling terminate because we call into C runtime a lot in bees, so we die anyway. Sometimes it SEGVs instead due to a libc bug, but at this point it's distinction without difference. Doesn't matter, in all of these cases the program is going to come to an abrupt stop very soon, the only difference is what messages it prints as it dies.

If we don't cancel the thread, or make other arrangements for it to exit, then bees would just hang in thread join. That's categorically worse than crashing--with a crash, at least the parent process knows something went wrong, but with a hang, everything looks OK from the outside but no dedupe gets done.

bees crashes won't harm any data, as bees is designed to tolerate system crashes. This is a necessity when developing something that literally lives in btrfs. bees was originally not designed to terminate at all so there are a few startling deficiencies remaining in the code that handles termination, and they are all low-priority fixes.

There is a catch_all block around bees option parsing, but we crash before we can give a "bad option" report because we try to destroy the BeesContext before the exception handler can say anything about the bad option--and we don't do it by using the SIGTERM handler, so it's up to the destructor code (which is a last-resort workaround). I have a half-written replacement for bees_main that does all the option parsing first, before starting up the BeesContext and its threads, but I put that work on hold in 2018 to spend some time fixing the kernel bugs.

vendetta1987 commented 4 years ago

I finally got around to run the script with the -x option. Find the output attached. output.txt

kakra commented 4 years ago

You may want to use long options with syntax --long-option=NUMBER... Usually, both formats are accepted but it may confuse some parsers if you omit =. But please check your options again, multiple options are supplied twice and --scan-mode is missing a number.

But you probably found a bug in the beesd wrapper script... As far as I understand it, it will throw away all "unsupported" arguments... Its parser doesn't handle arguments with option parameters at all: If it doesn't match exactly what bees --help shows, it is parsed as either invalid if starting with -- or it is interpreted as a UUID or filesystem parameter, in which case it is expected to be found in the config file. Your current work-around would be to either use bees (without d) directly (but you need to take care of mounting the root subvol somewhere then yourself) or pass in options that take additional values via the OPTIONS variable.

If you're using systemd, try the following:

# open an editor for a local service override
sudo systemctl edit beesd@

# an empty editor should show up, paste the following text and save+quit
[Service]
Environment=OPTIONS="--verbose=6 --scan-mode=2 --strip-paths"

This should work around the bug. I omitted --no-timestamps as that is already part of the original command line passed by the systemd service, see systemctl cat beesd@ to see the final result. Restart the service and try pgrep -a bees to find the actual options passed.

@Zygo This script is already complicated enough, such an additional logic would only bloat it even more. Wouldn't it be better to get that back into the C++ code? I have been working on such a feature some time ago and would continue working on it.

@Nefelim4ag Or maybe it would be better to reverse the logic and only check parameters not starting with a dash against the config file, and pass everything else over to bees as-is? Also, one could argue if the final cmdline should go with $OPTIONS or $ARGUMENTS first. It would still need to handle options omitting = somehow or simply ignore "filesystems" not found in the config file. It probably results in a mess, no matter how it's solved.