bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.29k stars 4.09k forks source link

Documentation: FAQ about reading files and alternatives #13300

Open cvrebert opened 3 years ago

cvrebert commented 3 years ago

I not-infrequently see internal questions about reading files in Starlark or (equivalently) defining targets (e.g. generating library targets or generating tests) based on (non-BUILD/bzl) files. FWICT, the Bazel docs currently don't address this directly. Thus, similar questions get asked and answered, wasting effort. Since these questions are about somewhat obscure corners of Starlark, they don't always get a swift answer. And the quality/detail of example code in the answers varies. Hence these suggestions.

  1. ​The docs ought to clearly state that Starlark & build rules themselves cannot read files (they can only pass file paths along to binaries). Currently, this is mostly implicit, in that Starlark lacks open()/file() functions.
  2. The docs ought to explain the philosophical/design reason for this restriction. AIUI, it would mean changes to arbitrary files (vs. just BUILD & .bzl files) in a source code tree could affect the build graph, which would make detecting changes & recomputing the build graph more expensive. Perhaps there are additional reasons?
  3. The docs should cover follow-up questions about what users should do instead. I've seen a few patterns; these should be codified.
    • Move the canonical source of the data into a .bzl file instead; possibly writing a Starlark rule to generate the previous data file.
    • Rather than reading a file and automatically generating N targets from the file, write a rule for generating a single target and then explicitly write out N instances of that rule in the BUILD file. Possibly write a script or automation for generating that BUILD code. If there's concern about the file and the targets drifting out-of-sync as new entries are added to the file, write a test which takes the file and all the targets as inputs, and asserts that the targets cover all entries in the file.
      • Instead of a monolithic main file, consider splitting the dataset into separate files. (This isn't always feasible/desirable.)
    • (I'm probably missing a couple more)
    • There are generally similar alternatives when dealing with a binary which generates a directory containing N output files, where the user wants each of the files to have associated target(s).

I have no opinion on whether (parts of) these answers belong in an actual "FAQs" page.

Have you found anything relevant by searching the web?

yeswalrus commented 2 years ago

Strong upvote here. For things like FlatBuffers (or I think even protobuf) that generate one file per table/struct, this is a big issue. The best options I can think of ATM are to write a custom rule and declare the output as a directory using ctx.actions.declare_directory, write a linter rule enforcing 1 struct per file with a matching name (which many devs are opposed to when there are several small dependent structs that aren't meant to be used separately) or as mentioned, write a custom build script that generates BUILD code, which is rather an ugly hack IMO and invites someone to build a meta-build system by creating space 'above' bazel.

github-actions[bot] commented 1 month ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

cvrebert commented 1 month ago

I have re-skimmed the docs. This issue still seems relevant.