facebook / buck

A fast build system that encourages the creation of small, reusable modules over a variety of platforms and languages.
https://buck.build
Apache License 2.0
8.56k stars 1.16k forks source link

Don't require .buckconfig and/or fix nested projects #939

Closed Qix- closed 8 years ago

Qix- commented 8 years ago

I know there was some discussion a while back about .buckconfig files being mandatory.

This anti-nesting paradigm is ultimately the reason why I have personally strayed away from build systems like Bazel and (now, potentially) Buck.

Let's start with the needless empty .buckconfig file:

I'm still getting the following

$ buck --version
buck version de38173cc0b9de07cb129298191280260a6c35e0

$ buck build //:foo
This does not appear to be the root of a Buck project. Please 'cd'
to the root of your project before running buck. If this really is
the root of your project, run
'touch .buckconfig'
and then re-run your buck command.

This whole concept of "project roots" throws a wrench into the whole modularity thing (link is to factor 2).

Why? Because you can't nest modules anymore. At least, not in any clear or direct way. Projects/repositories will (should) never know if they're really the "project root".


Here's an example. Let's take this sample repository that has a dependency within a dependency.

At a glance, here is the directory structure for each dependency:

/BUCK
/ext/names/BUCK
/ext/names/ext/compiler/BUCK

<------ downstream ---------
-------- upstream --------->

In the real world, each of the above would be a dependency of some sort, either via a submodule or some package manager. Each dependency is completely unaware of their consumers (downstream dependents) and merely expose a 'PUBLIC'-ly visible member for the outer codebase (downstream dependent) to consume.

Let's take a look at the compilation chain here, starting first with the inner-most dependency. All it does is export a single executable script using genrule() (since export_file() doesn't support executables).

Compiling it by itself is, of course, just fine.

$ cd ext/names/ext/compiler
$ buck build //:compiler
Using watchman.
[-] PROCESSING BUCK FILES...FINISHED 0.0s [100%] 🐳  New buck daemon
[-] DOWNLOADING... (0.00 B/S AVG, TOTAL: 0.00 B, 0 Artifacts)
[-] BUILDING...FINISHED 0.6s [100%] (1/1 JOBS, 1 UPDATED, 1 [100.0%] CACHE MISS)

Now let's go up a directory and try the names project. This project will "build" the compiler if it isn't already built (copied), and will proceed to use it to compile all of the names into a single file.

$ cd ext/names/
$ buck build //:all
Using watchman.
compiling 2 files

[-] PROCESSING BUCK FILES...FINISHED 0.0s [100%] 🐳  New buck daemon
[-] DOWNLOADING... (0.00 B/S AVG, TOTAL: 0.00 B, 0 Artifacts)
[-] BUILDING...FINISHED 0.6s [100%] (2/2 JOBS, 2 UPDATED, 2 [100.0%] CACHE MISS)

Sweet, it appears to have found the compiler and has executed it (the output compiling 2 files resulting from our 'compiler' script).

Let's go back up to the main repository and try to build it. It uses the names dependency to get a list of all names, and then reverses them. We're in the business of reversing things so this is really our intended output. We don't care that it uses a compiler dependency (the product manager has approved, etc.) thus we're just after its main all.names file.

$ buck build //:reversed
Using watchman.
[-] PROCESSING BUCK FILES...FINISHED 0.0s [100%] 🐳  New buck daemon
[+] DOWNLOADING... (0.00 B/S, TOTAL: 0.00 B, 0 Artifacts)
[+] BUILDING...0.3s
BUILD FAILED: Couldn't get dependency '//ext/compiler:compiler' of target '//ext/names:all':
No build file at ext/compiler/BUCK when resolving target //ext/compiler:compiler.

Wait a second! Buck is looking for the nested compiler dependency in <repo>/ext. That's not right!

Maybe we need relative paths. Let's apply this diff:

diff --git a/ext/names/BUCK b/ext/names/BUCK
index 6fc3742..e634800 100644
--- a/ext/names/BUCK
+++ b/ext/names/BUCK
@@ -5,5 +5,5 @@ genrule(
     name='all',
     srcs=['a.names', 'b.names'],
     out='all.names',
-    cmd='$(exe //ext/compiler:compiler) $SRCS > $OUT',
+    cmd='$(exe //./ext/compiler:compiler) $SRCS > $OUT',
     visibility=['PUBLIC'])

Cool, now we'll look for //./ext/compiler:compiler instead of the root ext folder.

$ buck build //:reversed
Using watchman.
[-] PROCESSING BUCK FILES...FINISHED 0.0s [100%] 🐳  New buck daemon
[+] DOWNLOADING... (0.00 B/S, TOTAL: 0.00 B, 0 Artifacts)
[+] BUILDING...0.3s
BUILD FAILED: Couldn't get dependency '//ext/names:all' of target '//:reversed':
//ext/names:all: Build target path cannot be absolute or contain . or .. (found //./ext/compiler:compiler)

Alright, what if we drop the // altogether? Maybe that means 'relative':

diff --git a/ext/names/BUCK b/ext/names/BUCK
index 6fc3742..34613fe 100644
--- a/ext/names/BUCK
+++ b/ext/names/BUCK
@@ -5,5 +5,5 @@ genrule(
     name='all',
     srcs=['a.names', 'b.names'],
     out='all.names',
-    cmd='$(exe //ext/compiler:compiler) $SRCS > $OUT',
+    cmd='$(exe ext/compiler:compiler) $SRCS > $OUT',
     visibility=['PUBLIC'])

Let's try that.

$ buck build //:reversed
[-] PROCESSING BUCK FILES...FINISHED 0.0s [100%] 🐌  Environment variable changes: [PATH, MANPATH, NVM_PATH, NVM_BIN]
[+] DOWNLOADING... (0.00 B/S, TOTAL: 0.00 B, 0 Artifacts)
[+] BUILDING...0.2s
BUILD FAILED: Couldn't get dependency '//ext/names:all' of target '//:reversed':
//ext/names:all: Path in ext/compiler:compiler must start with //

Well, what do we do now? Do we really have to re-adjust all of those dependencies to use the root of our repository?

What if I (hypothetically) have the source code for Awesomium complete with a vendored BUCK file by their team, and it in turn has Chromium inside of it with a vendored BUCK file vendored by Google.

If I build Awesomium by itself, great! It would work, assuming Google doesn't use any nested dependencies. But, when I include it in my own project and use Awesomium as a nested dependency, the whole thing will fail because it will look for Chromium relative to my "project root".


And thus we have a completely unusable nested dependency system in Buck.

CMake has a very similar problem, one which I addressed a while back, where CMAKE_SOURCE_DIR doesn't refer to the directory that houses CMakeLists.txt but instead the some/dir when running cmake some/dir.

The developers insisted all projects update their CMakeLists.txt to use CMAKE_CURRENT_SOURCE_DIR instead, which was pretty poorly documented (completely unocumented?) until version 3.0.2. While that isn't a problem now, most CMakeLists.txt for projects older than 2014 that haven't been updated in a while (there are a lot!) will fall fate to this flaw.

This renders add_subdirectory() very, very spotty in terms of intended functionality.

CMake isn't the only culprit. Several build systems fall short in this area, which is troubling since this directly hinders any usable package managers from spawning that can rely on a battle-tested build system instead of using their own and (ab)using strange tricks like symlinks or path variables to do what they do:

The whole concept of "project root" kind of sullies non-standard dependency management (e.g. not Java). Substitute the above submodule hierarchy with some package manager (e.g. npm@2, which we know is still used a lot since npm@3 doesn't work with node 0.10 and nvm automatically configures npm@2 all the way up through node 4.0.0!) and you'll run into these problems very quickly.


There are two proposals here - one is arguably more "philosophically correct" but more-breaky, and the other doesn't get rid of that pesky empty .buckconfig file -- but won't break as much.

Proposal 1

I propose the requirement for an empty .buckconfig be dropped and instead count on the user running buck from within a (sub)directory of a BUCK file (i.e. traverse upward until Buck finds a BUCK file and run from there). Existing .buckconfig files are treated normally.

This is pretty much exactly how Git does it.

Secondly, I propose //-prefixed paths refer to the BUCK file they are used in.

How do I reference rules above the current BUCK file? Simple: you don't. This would philosophically enforce that BUCK files are considered an upstream dependency and thus should never depend on a downstream dependency (parent directory, in this case) which would ultimately break the dependency chain.

This results in one less messy empty dotfile and makes building up project dependencies much cleaner and easier to maintain.

Proposal 2

I propose that .buckconfig actually marks a directory as a Buck project (officially). This means that all //... paths in all recursive BUCK files (that do not, in turn, have a .buckconfig sibling) refer to the next parent up directory with a .buckconfig file.

This is almost identical to the current functionality, though any nested projects that also have a .buckconfig file will have their target paths resolve to those respective directories instead of the "project root".


I understand either of these would introduce a breaking change, though without them, it makes Buck very hard to adopt by ad-hoc languages or environments, especially in the world of native development.

Please consider at least addressing this issue and discussing ways to handle such use cases. πŸ’ƒ

Qix- commented 8 years ago

I can't even fix this using the workaround of re-creating the rules in the higher directories right now.

[+] PARSING BUCK FILES...0.2s
BUILD FAILED: 'ext/arua-bootstrap-grammar/src/arua.leg' in '//:arua-leg' crosses a buck package boundary.  This file is owned by 'ext/arua-bootstrap-grammar'.  Find the owning rule that references 'ext/arua-bootstrap-grammar/src/arua.leg', and use a reference to that rule instead of referencing the desired file directly.

I'm going to have to use an os.unlink() call in the main BUCK file to remove the nested BUCK files in order to get passed the boundary and tell Git to ignore those submodules being dirty. :/

EDIT: the workaround:

# XXX temporary - see facebook/buck#939
import os
import os.path as path

def punlink(*args, **kwargs):
    try:
        os.unlink(*args, **kwargs)
    except:
        pass

punlink('ext/arua-bootstrap-grammar/BUCK')
punlink('ext/arua-bootstrap-grammar/.buckconfig')
punlink('ext/arua-bootstrap-grammar/ext/BUCK')
# XXX end temporary
evancox10 commented 8 years ago

You've hit the nail on the head. I looked at using Buck or Bazel for managing HDL environments (Verilog, SystemVerilog, VHDL, etc.), but lack of support for nesting projects was a show stopper.

In Verilog, the basic unit of "code" is actually called a module, so one would think it would be easy to adapt Buck to it. Not so! I'm unable to reuse a module from another Buck project if that modules also happened to reuse sub-modules

Thanks for the write-up here.

sdwilsh commented 8 years ago

I believe the undocumented Cells feature solves this problem. It's undocumented because it's still evolving as the team figures out edge cases that don't work well with it. I believe the Gerrit project uses these to manage some of their third-party dependencies (@davido has filed issues too that has led to improvements of the Cell UI). @mzlee and @philipjameson have also been playing with them and have been helping to improve the Cell UI.

Qix- commented 8 years ago

@sdwilsh I'm not sure I'm following. What are cells? Where can I read about them?

sdwilsh commented 8 years ago

The best there is right now is probably the integration tests for it (so documentation by example only): https://github.com/facebook/buck/tree/master/test/com/facebook/buck/crosscell/testdata

Qix- commented 8 years ago

From what I can gather about cells, no, this is not what I'm addressing. From what I can tell (since there is no documentation) cells are just sibling directories with Buck projects inside that allow you to reference another sibling. This doesn't solve this problem.

Coneko commented 8 years ago

Buck was initially developed for use in a monorepo, and is still pretty biased towards it.

We currently use cross cell with a flat hierarchy of cells, so I'm not sure how well it works with nested cells, but at least in theory it should as long as each cell only refers to its child cells, and not other descendants or ancestors.

In your example repo, I applied this patch:

diff --git a/.buckconfig b/.buckconfig
index e69de29..612c65d 100644
--- a/.buckconfig
+++ b/.buckconfig
@@ -0,0 +1,2 @@
+[repositories]
+names=ext/names
diff --git a/BUCK b/BUCK
index 9ebe9f4..c372eda 100644
--- a/BUCK
+++ b/BUCK
@@ -1,5 +1,5 @@
 genrule(
        name='reversed',
-       srcs=['//ext/names:all'],
+       srcs=['names//:all'],
        out='reversed.names',
        cmd='cat $SRCS | rev > $OUT')
diff --git a/ext/names/.buckconfig b/ext/names/.buckconfig
index e69de29..205fa39 100644
--- a/ext/names/.buckconfig
+++ b/ext/names/.buckconfig
@@ -0,0 +1,2 @@
+[repositories]
+compiler=ext/compiler
diff --git a/ext/names/BUCK b/ext/names/BUCK
index 6fc3742..068beaa 100644
--- a/ext/names/BUCK
+++ b/ext/names/BUCK
@@ -5,5 +5,5 @@ genrule(
     name='all',
     srcs=['a.names', 'b.names'],
     out='all.names',
-    cmd='$(exe //ext/compiler:compiler) $SRCS > $OUT',
+    cmd='$(exe compiler//:compiler) $SRCS > $OUT',
     visibility=['PUBLIC'])

Now you can build it:

log-mbp:nested-buck coneko$ NO_BUCKD=1 buck build //:reversed --show-output
Not using buckd because NO_BUCKD is set.
[-] PROCESSING BUCK FILES...FINISHED 0.7s [100%] 🐳  New buck daemon
[+] DOWNLOADING... (0.00 B/S, TOTAL: 0.00 B, 0 Artifacts)
[+] BUILDING...0.4s [0%] (0/1 JOBS, 0 UPDATED, 0 [0.0%] CACHE MISS)
 |=> //:reversed...  0.0s (checking local cache)
 |=> IDLE
 |=> IDLE
 |=> IDLE
The outputs are:
//:reversed buck-out/gen/reversed/reversed.names
log-mbp:nested-buck coneko$ cat buck-out/gen/reversed/reversed.names
xelA
nitsuA
nairdA
lA
kcuB
ylliB
boB
pmurB
ecniboB

Does this feature fit your needs?

Qix- commented 8 years ago

If Buck is to remain a monorepo-biased build system then yes, that does work. Thank you for the followup @coneko :)