Open layus opened 1 year ago
Took time to make a proper reproduction procedure: https://gist.github.com/layus/57fd0aeba78a9a67e8e684379f829e4d. I discovered several thing doing so:
bazel info
: WARNING: info command does not support starlark options. Ignoring options: [--@io_bazel_rules_docker//transitions:enable=no]
. That warning is gone in bazel 6, but that specific option is still ignored. My guess is because it references an external repo, but that remains to be proven.This makes the script simpler, as previous configs are not garbage collected. I am even able to compare them with bazel config $bad $good
. We see that the only difference is said flag which is incorrectly omitted when calling bazel info
: https://gist.github.com/layus/57fd0aeba78a9a67e8e684379f829e4d#file-output-log-L72-L76
Displaying diff between configs 35b359 and fbc5f0
FragmentOptions user-defined {
@io_bazel_rules_docker//transitions:enable: null, false
}
This in turns hints at more related issues:
https://github.com/bazelbuild/bazel/issues/13473, where this comment https://github.com/bazelbuild/bazel/issues/13473#issuecomment-1324274683 happens to be incorrect:
[...] The info and clean commands no longer warn, and starlark flags have no effect on these commands. We discover that while starlark flags have no effects on the output of these commands, running these commands has an impact on the daemon state. The incomplete configuration that these commands generate for their own use bump the internal graph version, and allow correct and needed nodes to be garbage collected.
This happens to be an issue because it discards useful state from the daemon. Regenerating that data completely takes more than one minute. bazel info
(and bazel {c,}query
to some extent) should not have that effect. Outside of the inefficiency, it is extremely surprising to have a sequence like bazel cquery; bazel info; bazel config <some config just printed>
fail because the config is not present anymore in the cache.
Data is not retained for as long as naively expected.
I have a rough fix that "works for me" in https://github.com/bazelbuild/bazel/pull/17875. Mostly POC and WIP and untested in the parsing error branch, but usable and functional.
@layus as you have discovered already in https://github.com/bazelbuild/bazel/pull/17875, commands like info
and clean
do not take starlark options into consideration. The reasoning behind this current restriction is because The info
and clean
commands have builds=true set in their annotation but don't actually do any building and starlark option loading might not work in these cases. The configuratbility team no longer owns this part of the code base so I'm routing this to the appropriate team for their input.
Regenerating that data completely takes more than one minute
Which data specifically do you mean?
I see the effect of bazel config
producing different results. But I'm not sure the actual build graph itself gets invalidated because of this.
One other debugging tool to look at this is bazel dump --skyrame=count
: this literaly dumps the state of the whole build graph on any call.
Also, I'd expect cquery
to be completely consistent with build
. Are you seeing otherwise?
I am not able to dig that issue right now, probably in the coming days or weeks. AFAIR the issue is that BASELINE_CONFIGURATION is changed, and that "invalidates" the whole graph. Restoring it to the normal, original value on the next build then needs to check the validity of all nodes. This basically means a traversal of all the graph nodes.
Description of the bug:
To fix issues with rules_docker transitions leaking into our own transitions and multiplicating the number of different configurations, we have added
build --@io_bazel_rules_docker//transitions:enable=no
to our .bazelrc, as per https://github.com/bazelbuild/rules_docker/pull/2068That works, but introduced a subtle and hard to track issue where running some bazel commands like cquery, dump or config would in some cases invalidate large parts of the daemon's build graph. I started investigating several months ago, and saw a breakthrough only recently.
It seems that these less frequently used commands have trouble taking that value into account. There is also some history with workspace label prefixes in custom build flags. See in general https://github.com/bazelbuild/bazel/issues/11128 and https://github.com/bazelbuild/bazel/issues/9177 for historical, somewhat related issues.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
This works for me, be I still have to provide a minimal repo where it works:
Which operating system are you running Bazel on?
Ubuntu Linux 20.04 (+nix)
What is the output of
bazel info release
?release 6.0.0- (@non-git)
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.Its bazel 6.0.0 from nixpkgs.
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
See description above
Any other information, logs, or outputs that you want to share?
No response