cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.05k stars 3.8k forks source link

build: build-time tooling go dependencies mistakenly mixed with run-time go dependencies #67473

Open knz opened 3 years ago

knz commented 3 years ago

tldr Currently we mistakenly intermingle the build-time tool dependencies and the run-time dependencies inside the same vendor tree. We need to split them.

Problems to solve

This causes the following problems:

  1. if a build-time tool announces a dep compatibility at version X, and CockroachDB needs version Y with Y>X, we are mistakenly forcing the build-time tool to use version Y which may change its behavior.

    This is the problem identified by the @bufbuild team in this comment.

  2. if CockroachDB announces a run-time dep at version X, and a build-time tool announces a dep at version Y with Y>X, we are mistakenly letting the tool force our hand on the run-time dep.

    This is the problem that caused massive wreckage of the master branch two weeks ago ( #67245 #67255 #67248 #67246 #67237): a tool was requiring a broken version of gRPC and we mistakenly bumped the gRPC version in CockroachDB when we upgraded the tool.

  3. when a build-time tool is updated and this forces some dependency upgrades, the dep upgrade does not receive the same level of scrutiny (e.g. from the security team) as when we update a dependency for CockroachDB's benefit. Even though they happen to be the same currently. This is a security risk.

  4. our security compliance roadmap requires us to clearly identify build-time and run-time dependencies separately. Currently this is difficult to do.

Urgency

The following pressure points exist to make us want to fix this soon-ish:

How to go about it

We should have two separate vendoring trees: one for tools, and one for CockroachDB.

In the current system with a vendoring repo, we'd want to operate as follows:

  1. split all the build-time tool dependencies into a separate cockroach-tools repo with its own vendor sub-tree
  2. make the cockroach repo depend on the cockroach-tools repo as a sub-module
  3. when building cockroach, recurse into the cockroach-tools tree to build all the tool deps prior to starting the build inside cockroach.

It's unclear (to me) how to achieve something similar with Bazel - someone with more experience should extend the issue with more details.

Epic CRDB-10447

Jira issue: CRDB-8559

rickystewart commented 3 years ago

We should have two separate vendoring trees: one for tools, and one for CockroachDB.

This doesn't fully resolve the first "problem to solve", right? Two different tools that we use for the build could each depend on different, incompatible versions of the same package.

It's unclear (to me) how to achieve something similar with Bazel - someone with more experience should extend the issue with more details.

We currently use Gazelle for this purpose, and we would continue doing so. We currently have a single go.mod file that Gazelle translates to the DEPS.bzl file -- presumably, with the new model we would have two go.mod files (or equivalent) that we'll have Gazelle translate into two different bzl macros. Then we'll just make sure that everything is wired up appropriately so all the build tools only point to build dependencies, and vice-versea for the runtime dependencies.

Typically Gazelle translates importpaths into repository names (like github.com/bufbuild/buf => @com_github_bufbuild_buf) and those repository names are what's used in the BUILD files. The main problem would be that you have collisions between the two sets of dependencies -- we could distinguish between them with a prefix or some other such hack (e.g., @crdbbuild_com_github_bufbuild_buf) and we can validate the change by making sure that all the build-time stuff only depends recursively on repositories named @crdbbuild_*. I don't think Gazelle supports this use case, but it would probably be reasonably easy to hack in, either as an upstream patch or some other hack.

bufdev commented 3 years ago

This doesn't fully resolve the first "problem to solve", right? Two different tools that we use for the build could each depend on different, incompatible versions of the same package.

This is true - you really need something separate for every tool.

For Golang tools, I'd highly recommend the "tmpdir" hack. For reference, we do it extensively in https://github.com/bufbuild/buf/tree/master/make/go (see the dep_.* files, most of them are Golang tools).

knz commented 3 years ago

This doesn't fully resolve the first "problem to solve", right? Two different tools that we use for the build could each depend on different, incompatible versions of the same package.

Yes you're absolutely right. I was looking at this from the perspective of crdb but the problem would remain for any pairwise combination of tools.

That said, separate vendoring would also largely trim down the size of the cockroach-gen repo, which is why I thought of it first. Maybe we can do both, separate vendoring for tools and also sub-vendoring for different tools.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!