facebookincubator / reindeer

Reindeer is a tool to transform Rust Cargo dependencies into generated Buck build rules
MIT License
179 stars 28 forks source link

Middle-ground vendoring option using a local registry #46

Open cormacrelf opened 4 months ago

cormacrelf commented 4 months ago

The problem is a goldilocks one.

  1. vendor = false is quick & easy to manage deps & buckify, but pretty bad day to day.

    • Doesn't work offline as buck2 issues HEAD requests constantly
    • Terrible DX annoyance with "too many open files" errors due to buck trying to download 1000 crates at once. The standard start to your day looks like "run buck2 build about 15 times until by random chance the scheduler manages to get past those errors"
    • Those crates get downloaded again, and again, and again
    • reindeer buckify takes 2 seconds or so. Pretty convenient.
  2. [vendor] ... is slow to manage deps & buckify.

    • Neat for small projects
    • Also probably neat for Meta with y'all's funky EdenFS etc.
    • But middle ground is bad
      • Middle = vendor directory of 1000 crates, 1.2 GB, 50k source files. Mostly from dupes of the windows crates which can't be pinned to one single version etc.
      • reindeer vendor takes 35 seconds
      • reindeer buckify takes 20 seconds
      • git status takes 150ms
      • The vendor folder wrecks git performance simply by its existence.
    • Build experience is perfect, works offline, etc.

I think we need a solution for the middle ground:

Outcomes:

Problems:

cormacrelf commented 4 months ago
And here is an `extract_archive` rule based on prelude's `http_archive` that makes all this work. ```starlark def _tar_strip_prefix_flags(strip_prefix: [str, None]) -> list[str]: if strip_prefix: # count nonempty path components in the prefix count = len(filter(lambda c: c != "", strip_prefix.split("/"))) return ["--strip-components=" + str(count), strip_prefix] return [] def _unarchive_cmd( # ext_type: str, # exec_is_windows: bool, archive: Artifact, strip_prefix: [str, None]) -> (cmd_args, bool): unarchive_cmd = cmd_args( "tar", "-xzf", archive, _tar_strip_prefix_flags(strip_prefix), ) return unarchive_cmd, False def _extract_archive_impl(ctx: AnalysisContext) -> list[Provider]: archive = ctx.attrs.src # no need to prefer local; this is not a downloaded object. unlike http_archive prefer_local = False unarchive_cmd, needs_strip_prefix = _unarchive_cmd(archive, ctx.attrs.strip_prefix) exec_is_windows = False output_name = ctx.label.name output = ctx.actions.declare_output(output_name, dir = True) script_output = ctx.actions.declare_output(output_name + "_tmp", dir = True) if needs_strip_prefix else output if exec_is_windows: ext = "bat" mkdir = "md {}" interpreter = [] else: ext = "sh" mkdir = "mkdir -p {}" interpreter = ["/bin/sh"] exclude_flags = [] script, _ = ctx.actions.write( "unpack.{}".format(ext), [ cmd_args(script_output, format = mkdir), cmd_args(script_output, format = "cd {}"), cmd_args([unarchive_cmd] + exclude_flags, delimiter = " ").relative_to(script_output), ], is_executable = True, allow_args = True, ) exclude_hidden = [] ctx.actions.run( cmd_args(interpreter + [script]).hidden(exclude_hidden + [archive, script_output.as_output()]), category = "extract_archive", prefer_local = prefer_local, ) if needs_strip_prefix: ctx.actions.copy_dir(output.as_output(), script_output.project(ctx.attrs.strip_prefix)) return [DefaultInfo( default_output = output, sub_targets = { path: [DefaultInfo(default_output = output.project(path))] for path in ctx.attrs.sub_targets }, )] extract_archive = rule( impl = _extract_archive_impl, attrs = { "src": attrs.source(), "strip_prefix": attrs.option(attrs.string(), default = None), "sub_targets": attrs.list(attrs.string(), default = [], doc = """ A list of filepaths within the archive to be made accessible as sub-targets. For example if we have an http_archive with `name = "archive"` and `sub_targets = ["src/lib.rs"]`, then other targets would be able to refer to that file as `":archive[src/lib.rs]"`. """), }, ) ```