bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.25k stars 4.08k forks source link

cache_computed_file_digests is a useful performance option that should be documented #14401

Open c0d3h4x0r opened 2 years ago

c0d3h4x0r commented 2 years ago

Description of the problem / feature request:

cache_computed_file_digests is a useful performance option that should be documented.

Feature requests: what underlying problem are you trying to solve with this feature?

This option is useful for improving incremental build speed on large codebases.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Immediately following a full build, do a no-op incremental build on our codebase containing ~400,000 input files. It takes about 15 minutes.

Immediately following a full build, do a no-op incremental build with --cache_computed_file_digests=400000. It takes about 15 seconds.

What operating system are you running Bazel on?

Windows 10

What's the output of bazel info release?

Starting local Bazel server and connecting to it...
release 4.2.1

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

n/a

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

fatal: not a git repository (or any of the parent directories): .git

Our codebase is stored in a company-internal Perforce server, not in Git.

Have you found anything relevant by searching the web?

No.

Any other information, logs, or outputs that you want to share?

n/a

meisterT commented 2 years ago

@janakdr since you reviewed the change that introduced the flag: can you comment on potential correctness concerns of the flag?

janakdr commented 2 years ago

I think the correctness is pretty good, since it uses inodes. However, does Windows have inodes? Also, I'm confused by this report, since the flag is not intended to help the case of a "no-op incremental build": in that case, Bazel should have all metadata in memory. The lower latency is supposed to happen after a flag flip, when we discard our in-memory actions, and it's supposed to apply to the output tree: source nodes don't get cleared after a flag flip. So I think a more detailed description of the Bazel commands run would be helpful (not targets, just flags).

meisterT commented 2 years ago

Bazel should have all metadata in memory

That's what I thought at first as well but then remembered that windows doesn't have watchfs (see https://github.com/bazelbuild/bazel/issues/1931) and assumed that this is the reason why we don't trust the metadata. Is that assumption correct?

janakdr commented 2 years ago

Ah, wasn't thinking. Yes, that could be it. But I think Windows doesn't have a node id. So we're down to mtime and size, which have been historically buggy. For instance, you have two files, header1.h and header2.h. They have the same size and mtime, but different contents. Now you copy header2.h onto header1.h. The node id would change, if we had it, but we don't.

guw commented 2 years ago

Just learned here about --watchfs. It's disabled by default. I think there is also a problem on Mac. I see lots of scanning just for simple "no-op" commands. It doesn't look like anything is cached by default.

This might be what I am seeing as well: https://groups.google.com/g/bazel-discuss/c/E_0F1t4FITE

c0d3h4x0r commented 2 years ago

@guw fyi:

The Windows filesystem watcher is still experimental as of Bazel 5.1.1 (the version we're now on). We encountered other problems when we tried to enable it, so it is not a viable substitute for using this undocumented performance option.

c0d3h4x0r commented 2 years ago

So what's the right path forward here? Right now, our business is taking a necessary dependency on this undocumented option, which isn't a position we want to remain in.

meisterT commented 2 years ago

I think it would be helpful to know the exact sequence of Bazel commands (feel free to leave out targets) and also what modifications you did in between. And perhaps the Bazel output on the second command.

c0d3h4x0r commented 2 years ago

In the original repro report, it was just "bazel build ..." followed by another "bazel build ...", with no changes of any kind in between.

But the way we kick off our builds has changed, so the command is now different -- plus we've been using this undocumented flag for a long while now. I'd have to run a clean build from scratch (takes over 40 minutes the first time) using our latest bits, with the flag explicitly unset, to get a fresh repro log for you. Let me know if you still need that, and I'll suffer through it.

Sent via Outlook for iOShttps://aka.ms/o0ukef


From: Tobias Werth @.> Sent: Monday, April 25, 2022 12:48:09 AM To: bazelbuild/bazel @.> Cc: Keith F. Kelly @.>; Author @.> Subject: Re: [bazelbuild/bazel] cache_computed_file_digests is a useful performance option that should be documented (Issue #14401)

I think it would be helpful to know the exact sequence of Bazel commands (feel free to leave out targets) and also what modifications you did in between. And perhaps the Bazel output on the second command.

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbazelbuild%2Fbazel%2Fissues%2F14401%23issuecomment-1108199735&data=05%7C01%7C%7C8874c09fe67c493c852e08da268ff3db%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637864696962732020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=utjJy2izmKUX6Ni%2FxbjT1AThhkAJ3xzqZMQP26zNz0E%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAF2EYJHNHZ7OVGO2KO2VAL3VGZE3TANCNFSM5JXU6FDQ&data=05%7C01%7C%7C8874c09fe67c493c852e08da268ff3db%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637864696962732020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A0EaqDcyfUwuxkK59s2aFHNo3qoKFXbDJN1xzDKkmJg%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

brentleyjones commented 1 year ago

Before this gets flagged as stale, this is still a valid issue. This flag should be documented, and potentially the default increased as well.

meisterT commented 1 year ago

cc @coeuvre

tjgq commented 9 months ago

Doing some back-of-the-napkin math, a digest cache with 50k entries (current default) should consume around 11MB, assuming compressed oops, average pathname length of 100 bytes, and hash length of 32 bytes. So increasing the default is unlikely to cause problems. @meisterT, WDYT about making it, say, 100k?

It would also be nice to provide some sort of indication that the cache size is too small to be effective (e.g., if can't fit the inputs+outputs of the largest action in a build, or perhaps even an entire build's worth of them).