Open SchahinRohani opened 3 months ago
I can reproduce on x86_64-linux
. This is caused by the fact that it's not possible to crosscompile for darwin, not even between aarch64-darwin
and x86_64-darwin
. I thought I cut these paths entirely in #1233 but apparently I overlooked something.
The solution to this is to remove the darwin flake outputs from the packages
section on systems that don't support them. I.e.
x86_64-darwin
if the host is aarch64-darwin
aarch64-darwin
if the host is x86_64-darwin
@SchahinRohani Do you want to try solving this issue?
Thanks for the feedback on the flake error. I couldn't manage the quick fix due to the high complexity involved, so I decided to redesign the basic structure and work with Nix scripts. This led me to the following proposal:
Setting up a multi-shell environment consisting of three shells: baseShell, nativeShell (for Rust developers), and webShell (for web projects like the docs). The baseShell would include only the essentials like linting and formatting tools, while the nativeShell would provide the full Rust development environment, including the necessary toolchains and bazel. Similarly, the webShell would include system toolchains required for things like Playwright. But the most important thing is to always modularize all scripts to an external place so that the flake.nix remains clean and only organizes the outputs.
However, given the complexity involved, I believe this is something we should definitely tackle together as a team. There's a lot going on here, and having everyone's input will be crucial to get this right.
Additionally, the CI would benefit from this setup, as it could be configured to run with only the necessary packages for each task, leading to more efficient and faster builds.
@aaronmondal Let me know what you think!
I can see that initially it might seem a bit curious why all our CI jobs take so long to "boot up" and it's tempting to try to reduce the size of the devshell to make some jobs quicker. However, there is actually not too much to gain here.
The baseShell would include only the essentials like linting and formatting tools
The pre-commit hooks already skip the devshell entirely since they use nix flake check
. So for the pre-commit hooks the workflow is already optimal and does what a "baseShell" would do. Hence the ~1 min runtime for pre-commit hooks in CI on cache hits.
the nativeShell would provide the full Rust development environment, including the necessary toolchains and bazel. Similarly, the webShell would include system toolchains required for things like Playwright.
Compared to the "nativeShell" parts, the "webShell" is almost the same in size since both run a Bazel build in the same stdenv (which makes up the majority of the devshell size). Compared to that, the additional space requirements from webdev-specific tools (playwright being the large one here) are negligible.
You also wouldn't get too much of a speedup since setting up the devshell in CI is acutally fairly fast. In this job it takes about 6 minutes: https://github.com/TraceMachina/nativelink/actions/runs/10424002033/job/28871873664#step:6:1678. Of that, 2.5 minutes are spent on building the native-cli
due to a cache miss and 1 minute for playwright, also due to a cache miss. If we ignore those two cache misses we're at ~2.5 min startup time. Optimizing here doesn't seem too useful.
Maintenance-wise it's also a nice property to have to only check a single devshell. If we had multiple devshells we'd have to test each one. Since Nix's "core" dependencies like gcc, glibc, etc are shared, we'd end up with an overall net increase of CPU cycles instead of a reduction. To elegantly handle multiple devshells we'd likely have to use multiple envrcs. But then it becomes unclear what happens when you're in one devshell and need to invoke a tool that might behave differentlly in another devshell.
In general, when improving CI performance it should be an "efficient" improvement. That is, a speedup of 2x isn't a "real" speedup if it requires 2x the compute resources. If however we could get a 10% CPU cycle reduction in CI that would be a fantastic gain. The tricky thing here to keep in mind is that parallelizing certain jobs is only useful when we don't regress in coverage. So initially it might seem like multiple devshells speed things up, but because of the added CI jobs required to cover all usecases we'd end up with a net loss of efficiency.
Removing "good first issue" label for now as this might be a potentially highly complex thing to fix.
Description
I'm encountering issues when running
nix flake show --allow-import-from-derivation
on MacOS (aarch64-darwin). The command results in an error that halts the process. It appears that there are issues in building the nixpkgs-patched with the wrong architecture.Warnings & Errors
Environment
Operating System: MacOS Architecture: aarch64-darwin Nix Version: nix (Nix) 2.19.2 Nativelink Version: 0.5.1
Steps to Reproduce
Run
nix flake show --allow-import-from-derivation
on MacOS with aarch64-darwin architecture. Observe the warnings and errors during the evaluation.Expected Behavior
The
nix flake show --allow-import-from-derivation
command should complete successfully or provide actionable feedback without encountering fatal errors.