Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

Crash compiling the latest release #164

Open sbrl opened 3 years ago

sbrl commented 3 years ago

The latest master does compile correctly, but I thought I'd inform you that the latest release (0.6.4 as of the time of posting) does not compile at all on Ubuntu Server 20.04 LTS. It fails with the following error message:

In file included from extentwalker.cc:6:
../include/crucible/limits.h: In instantiation of ‘To crucible::ranged_cast(From) [with To = long unsigned int; From = long int]’:
extentwalker.cc:483:44:   required from here
../include/crucible/limits.h:24:35: error: comparison of integer expressions of different signedness: ‘long int’ and ‘long unsigned int’ [-Werror=sign-compare]
   24 |   if (numeric_limits<From>::max() > numeric_limits<To>::max() && numeric_limits<From>::max() < numeric_limits<To>::max()) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/crucible/limits.h:24:94: error: comparison of integer expressions of different signedness: ‘long int’ and ‘long unsigned int’ [-Werror=sign-compare]
   24 | ) > numeric_limits<To>::max() && numeric_limits<From>::max() < numeric_limits<To>::max()) {
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

../include/crucible/limits.h:29:35: error: comparison of integer expressions of different signedness: ‘long int’ and ‘long unsigned int’ [-Werror=sign-compare]
   29 |   if (numeric_limits<From>::max() > numeric_limits<To>::max() && f > static_cast<From>(numeric_limits<To>::max())) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/crucible/limits.h: In instantiation of ‘To crucible::ranged_cast(From) [with To = long int; From = long long unsigned int]’:
extentwalker.cc:529:90:   required from here
../include/crucible/limits.h:24:35: error: comparison of integer expressions of different signedness: ‘long long unsigned int’ and ‘long int’ [-Werror=sign-compare]
   24 |   if (numeric_limits<From>::max() > numeric_limits<To>::max() && numeric_limits<From>::max() < numeric_limits<To>::max()) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
../include/crucible/limits.h:24:94: error: comparison of integer expressions of different signedness: ‘long long unsigned int’ and ‘long int’ [-Werror=sign-compare]
   24 | ) > numeric_limits<To>::max() && numeric_limits<From>::max() < numeric_limits<To>::max()) {
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

../include/crucible/limits.h:29:35: error: comparison of integer expressions of different signedness: ‘long long unsigned int’ and ‘long int’ [-Werror=sign-compare]
   29 |   if (numeric_limits<From>::max() > numeric_limits<To>::max() && f > static_cast<From>(numeric_limits<To>::max())) {
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[1]: *** [Makefile:43: extentwalker.o] Error 1
make[1]: Leaving directory '/home/sbrl/bees/lib'
make: *** [Makefile:37: lib] Error 2

As I mentioned above, this does not affect the latest master. I'm just opening this issue to let you know, and suggest that creating a new release might be worth doing.

Zygo commented 3 years ago

Ironically, that code is meant to detect at runtime exactly the bug that the warning is warning about at compile time: it attempts the desired cast, and verifies that < and > operators behave consistently with the result, before allowing the caller to do whatever the caller was trying to do.

Maybe it can be reordered to avoid banging constants of signed and unsigned types together, or there's a more C++ way to do it with type traits like is_signed. I'll look into that, but I'm worried about one thing: I don't know why it would be different for 0.6.4 vs master. Nothing jumps out from the diff, and we set -Wall -Wextra -Werror -O3 in both places. crucible/limits.h and lib/extentwalker.cc are mostly identical and it's not complaining about the parts that are different.

For a short-term solution add -Wno-error=sign-compare to CCFLAGS to allow the build to proceed with the warning.

kakra commented 3 years ago

Maybe you should setup a CI test in this project, run that with -Werror and remove -Werror from the default git checkout? E.g., Gentoo QA is complaining about -Werror and removes the flag manually from the build chain during installation.

Zygo commented 3 years ago

Due to the privileged and pervasive nature of this program, I'd really rather it not build than produce binaries with ignored warnings. Package maintainers are free to add -Wno-error to their build scripts if they trust upstream, but someone pulling code from master directly might be in for a nasty surprise if something important is missed, especially if it's the kind of thing that shows up with a toolchain update.

I run CI (and usually some weeks of testing) before every push, but my set of CI build hosts isn't as diverse as those of the bees user community (and apparently doesn't contain the latest Ubuntu any more...OK, that I should fix).

sbrl commented 3 years ago

Oh, that's odd. I just assumed that the bug had been fixed in the latest master. I ran make clean in between the make calls for 0.6.4 vs master. Maybe I should have re-cloned instead?

Regarding CI, maybe GitHub Actions could be used simply to attempt compiling it on a variety of different OSes? Haven't used GitHub actions too much myself, so not sure how possible it would be. But it might be perhaps useful. But then again, I don't work on developing bees, so I don't know :P

Anyway, shall I close this issue, or leave it open?

kakra commented 3 years ago

Yeah, Github Actions was what I was thinking about. It would allow contributors to get direct feedback on pull requests.

Zygo commented 3 years ago

I was thinking github actions too, but docs.github.com was down earlier today when I wanted to read about it :-P

Though...now that it's back up, I see that their ubuntu-latest runner would not have caught this particular bug, as it's still on Ubuntu 18. :-/

I'm thinking more like "add Fedora and {Arch,Mint,NixOS} VMs to my test fleet," which already includes Debian unstable, Ubuntu 18, and Raspberry Pi.

Zygo commented 3 years ago

Anyway, shall I close this issue, or leave it open?

Leave it open please. There are better ways to write that template, and no CI workflow can be considered correct if it doesn't fail to build that code with -Werror on new GCC.

sbrl commented 3 years ago

Sure thing - I'll leave it open then :-)

After managing to compile it in the end, I've set bees running on my 3 x 4TB HDD raid1 setup. After an initial few hours of high resource usage (I assume it was scanning for the first time), bees quieted down and has been running great ever since :D

I've even now enabled (read-only) snapshots, and it's still fine (I'm using the workaround btrfs send to stop bees from scanning read-only snapshots, as I assume all deduplication efforts on the primary copy of the data will eventually be replicated via the snapshots).

Thanks for creating an awesome project!