facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.56k stars 2.1k forks source link

Inclusion of Bazel build files in official zstd library #3123

Open gonzojive opened 2 years ago

gonzojive commented 2 years ago

Is your feature request related to a problem? Please describe. I need to use this library to build RocksDB. We use bazel to build everything, which means zstd must build with bazel.

Currently we are achieving bazel builds by pulling in https://github.com/cschuet/zstd/pull/2, which tends to go stale.

Describe the solution you'd like Add a BUILD.bazel and WORKSPACE file to the zstd repo. Bazel users can use this and help maintain it.

Describe alternatives you've considered

  1. Out-of-repository bazel projects like https://github.com/cschuet/zstd/pull/2
  2. Using CMake (or whatever zstd uses) within bazel.

I know only a casual amount about the C++ build ecosystem, so I'm not sure what all the options are. Bazel is nice because users can issue a uniform set of commands across all languages, so I don't /have/ to know the C++ ecosystem to build C++ tools

Additional context Would the authors accept a pull request with BUILD.bazel and WORKSPACE files?

See https://github.com/cschuet/zstd/blob/02cedda8da27c2043166dec046599e13da4a68b6/bazel/third_party/zstd.BUILD for expected contents.

Cyan4973 commented 2 years ago

Would the authors accept a pull request with BUILD.bazel and WORKSPACE files?

Yes, a bazel build script in zstd/build/bazel would be a welcomed patch.

eli-schwartz commented 2 years ago

I need to use this library to build RocksDB. We use bazel to build everything, which means zstd must build with bazel.

This logic is similar to the logic by which the zstd developers accept community-maintained cmake and meson build descriptions, so thematically it makes a lot of sense for them to accept bazel as well.

There is a caveat that they don't want any of this in the root of the repository, so it has to live in the build/ directory as mentioned above. This may or may not be an issue for bazel! I have no idea; I've never used bazel before. I do know it's an issue for some other build systems as you cannot add the zstd directory as a component without first copying files out of build/ and into the project root.

so I don't /have/ to know the C++ ecosystem to build C++ tools

BTW: build systems (if they are any good) generally do not require you to know a specific programming language ecosystem. :) Bazel would not be unique in that regard, cmake and meson and even Make are all general purpose build systems (that may have problems other than being tied to a specific language). I guess the counterpoint is e.g. build2? But I digress.

gonzojive commented 2 years ago

I need to use this library to build RocksDB. We use bazel to build everything, which means zstd must build with bazel.

This logic is similar to the logic by which the zstd developers accept community-maintained cmake and meson build descriptions, so thematically it makes a lot of sense for them to accept bazel as well.

There is a caveat that they don't want any of this in the root of the repository, so it has to live in the build/ directory as mentioned above. This may or may not be an issue for bazel! I have no idea; I've never used bazel before. I do know it's an issue for some other build systems as you cannot add the zstd directory as a component without first copying files out of build/ and into the project root.

so I don't /have/ to know the C++ ecosystem to build C++ tools

BTW: build systems (if they are any good) generally do not require you to know a specific programming language ecosystem. :) Bazel would not be unique in that regard, cmake and meson and even Make are all general purpose build systems (that may have problems other than being tied to a specific language). I guess the counterpoint is e.g. build2? But I digress.

When something goes wrong with something that builds with make or cmake or ./configure, I often find myself googling for error strings. Sometimes they are errors from make, sometimes from gcc, sometimes from clang?. Since bazel tries to produce reproducible builds, builds tend not to fail as much, so I end up googling for fewer strings. In theory. end smack talk

https://github.com/gonzojive/zstd/blob/80014358de7231436efdcb370cef38e01410ff6a/BUILD.bazel has a BUILD file that seems to work.

WORKSPACE and BUILD.bazel being in a subdirectory might or might not be a problem. I'll need to look into that more.

terrelln commented 1 year ago

We'd definitely be happy to host a bazel build system in build/bazel/ that is community maintained. It would also need a test in CI to ensure that it continues to work, otherwise we will certainly accidentally break it at some point.

aaronmondal commented 1 year ago

Oh I didn't know this issue existed 😅 I just saw this now because I'm planning to add zstd to the Bazel Central Registry as well.

I've added a Bazel buildfile to LLVM in https://github.com/llvm/llvm-project/commit/75d2032e9ba39b3ae0da17f8cd27d66030bf901c. Missing tests and has some selects that a buildfile in this repo wouldn't need, but other than that should be fairly easy to port.

Apparently that buildfile works with zstd 1.5.5 as well, as I've updated this in rules_ll already in https://github.com/eomii/rules_ll/commit/b92a7223ec0bdcfdb7b4868fa665618050857f39 and things seem to keep working without modifications. So I expect maintenance burden to be minimal.

One issue though: Adding a bazel module in a subdirectory significantly complicates buildfiles and makes them much more bug-prone. It's possible and I understand that having bazel files in the root directory might be undesirable, but we've run into countless "could've been avoided" CI reruns in LLVM because contributors keep forgetting that there is an entire double-overlay thing going on to make such a layout usable.

Considering that Bazel support in this repo requires just a ~50 loc buildfile, I question whether it's actually a good idea to add such a level of convolution to that.

WORKSPACE.bazel  # Empty file.
BUILD.bazel  # 50 loc.
MODULE.bazel  # ~3 lines. Imports `rules_cc` and exposes zstd to the BCR

The MODULE.bazel makes it easy to expose zstd in the Bazel central registry, a "package manager" for Bazel which will become the default behavior probably towards the end of this year.

Corresponding issue in the BCR: https://github.com/bazelbuild/bazel-central-registry/issues/568

eli-schwartz commented 1 year ago

Considering that Bazel support in this repo requires just a ~50 loc buildfile, I question whether it's actually a good idea to add such a level of convolution to that.

It's not obvious to me that there's meaningful value-add to complicating this either. I'd love if it were possible to have a small stub meson project in the project root as well -- it could actually just chainload the rest of the build description, I'm not fussy about that, but it is mandatory to have something in the project root. The current state of affairs is that the meson overlay tooling supplies that ourselves, which sort of works but prevents people from e.g. testing out development commits of zstd via embedding into another project, because the zstd repository isn't 100% usable for that as-is (though you can still manually build with meson directly from the zstd repository, the convolution isn't a hard blocker for that). And there have been synchronization issues, too.

aaronmondal commented 1 year ago

TBH my personal preference is always having the entry points to different build systems like CmakeLists.txx, MODULE.bazel, meson.build all in the project root.

I didn't even know zstd had a CMake build at all because I always just look for a CMakeLists.txt in the root directory out of habit. Having all build systems in the root next to each other makes it easy for users to immediately spot their favorite build system and pick that.

IMO it doesn't matter too much which of these buildsystems are "officially" supported and which aren't. They all need to be tested in CI anyways and can coexist in harmony 😇

Cyan4973 commented 1 year ago

We understand that larger projects may depend on multiple repositories, and may delegate dependency management to their build systems, which will then expect to find a specific file hosted (by default) at the root of the repository.

To this end, we are relaxing the restriction set for alternative build systems (beyond Makefile) to allow 1 single file at root of the repository.

If the build system requires more than 1 file, all the other files should still be hosted into their own build/* subdirectory.

eli-schwartz commented 1 year ago

To this end, we are releasing the restriction set for alternative build systems (beyond Makefile) to allow 1 single file at root of the repository.

Can I ask for a clarification here? Meson requires two files at the root of the repository, if you want to have project options. One to stub-load the build system and one to define the options in a standalone file that can be evaluated without running the whole build. Currently zstd's contributed build system in build/meson/ does have project options.

If two files are acceptable, then I'd be happy to help set up the stub loader. If two files aren't acceptable then maybe I could try to figure out some way to do without options, hmm...

Cyan4973 commented 1 year ago

The initial idea was to allow only 1 file, to cut short inflationary requests (after 2, why not 3, then why not more than 1 directory, etc.).

That being said, we had the cmake model in mind. Indeed, meson is a bit different in this regard, since it allows declaring options and setting default option values via a separate meson.options file, and more importantly disallows setting said default options from within meson.build, making 2 files pretty much compulsory to define options.

One thing I was wondering is if the root meson.build could guide the finding of meson.options into a different (non-root) directory (I don't know meson enough to tell so). I haven't found anything so far, and if the external build system autonomously searches for mesons.options directly at root, without even reading/checking meson.build, then there is little meson.build could do about that.

aaronmondal commented 1 year ago

I can definitely see that it's generally desirable to only have a single file a'la CMakeLists.txt, but I'm not sure how many files an absolute minimum for a Bazel build would be.

Soo speaking of inflationary requests :rofl: : a legacy-style Bazel project needs WORKSPACE.bazel and BUILD.bazel, but that model will soon be deprecated. The new module system requires WORKSPACE.bazel, MODULE.bazel and BUILD.bazel, and very soon also a MODULE.bazel.lock lockfile. I believe there are plans to remove the WORKSPACE.bazel file entirely, but at the moment it's still required.

Actually invoking from within a project usually also requires some configuration flags set in a .bazelrc file. For CI jobs this can just be passed in the action YAMLs directly, but if someone wants to git clone zstd && cd zstd && bazel build ... that wouldn't work without a .bazelrc in the project root. Since zstd will mostly be used as only an external dependency I think we can kick out that .bazelrc.

There are several ways to work around the default layout, but of course every deviation increases the cognitive complexity of the build logic. For instance, the LLVM-overlay essentially fetches LLVM twice through 3 different repos with patches and ovelays occurring in 3 (or 4?) different places. Few reviewers have the time to review such convoluted setups. LLVM does have ~25k LoC of Bazel buildfiles though, so I guess it's somewhat acceptable in that case.

I'd imagine a similar issue for meson. I'm sure it'll be possible to work around meson's default build layout, but at some point it'll become quite hard to understand what is even going on :sweat_smile:

Only MODULE.bazel (and soon MODULE.bazel.lock) need to contain actual content. All other files must exist in the project root, but I believe the actual build logic can be deferred to a subdirectory. It might be an option to only have the MODULE.bazel file in the zstd root and patch in the other files via an external patch that comes from Bazel's module registry. A patch that just adds some empty files is significantly less maintenance burden than overlay logic after all.

This way we might be able to keep this to a minimum of currently 1 (and with lockfile 2).

nya3jp-google commented 1 year ago

I'm also interested in building zstd with Bazel. While I agree it's "clean" to have MODULE.bazel at the root directory, I also see that it's a bit controversial because we already have multiple community-maintained build system support in build/.

I wonder if we can have separate discussions for adding Bazel support and moving community-maintained build files to the root directory. How much does it complicate things to have MODULE.bazel at build/bazel/?

eli-schwartz commented 1 year ago

One thing I was wondering is if the root meson.build could guide the finding of meson.options into a different (non-root) directory (I don't know meson enough to tell so).

It cannot.

Actually invoking from within a project usually also requires some configuration flags set in a .bazelrc file. For CI jobs this can just be passed in the action YAMLs directly, but if someone wants to git clone zstd && cd zstd && bazel build ... that wouldn't work without a .bazelrc in the project root. Since zstd will mostly be used as only an external dependency I think we can kick out that .bazelrc.

Probably, yes. Actually even for building zstd directly it sounds like it should be possible to make this a documentation issue and people can pass the rc file as an option? The equivalent meson functionality I guess would be --cross-file=build/meson/presets/foobar.ini.

Only MODULE.bazel (and soon MODULE.bazel.lock) need to contain actual content. All other files must exist in the project root, but I believe the actual build logic can be deferred to a subdirectory.

I confess I know nothing about bazel, but it does seem odd to me that (flag) files with no content need to exist. If they don't have any content, what work are they performing? Maybe bazel itself could better handle this?

A patch that just adds some empty files is significantly less maintenance burden than overlay logic after all.

That does make a certain amount of sense, yes.