apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.65k stars 1.03k forks source link

Add gradle cache option and make some tasks cacheable [LUCENE-10195] #11232

Closed asfimport closed 2 years ago

asfimport commented 3 years ago

Increase Gradle build speed with help of Gradle built-in features, mostly cache and up-to-date checks


Migrated from LUCENE-10195 by Jerome Prinet, resolved Nov 02 2021 Pull requests: https://github.com/apache/lucene/pull/414

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

I mainly focussed on up-to-date checks and task caching, the most spectacular improvement happens when running a clean with a populated cache:

./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan

 

Before changes

https://gradle.com/s/kxkbukaklyiz4

1206 tasks executed in 40 projects in 5m 39s, with 72 avoided tasks saving 16.359s

After changes

https://gradle.com/s/mfiwiheg4wxjq

1206 tasks executed in 40 projects in 26s, with 394 avoided tasks saving 30m 2.417s

 

Here is the detail of the changes:

 

Here some advices for future improvements:

=> :lucene:analysis:common:compileTestJava is not cacheable due to overlapping outputs

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

Using com.palantir.consistent-versions Gradle plugin triggers a deprecation warning and makes it impossible to enable configuration on demand which would help to improve performances as well.

I filed an issue in order to get this resolved.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I did take a look at the patch. Thank you - some of the things look like nice improvements. This command is not what should be the benchmark though:

./gradlew clean build -Ptests.seed=deadbeef -Ptests.nightly=false -Ptests.neverUpToDate=false --scan

instead, the second invocation of this (full incremental build) should be measured instead (tests are a separate story):

./gradlew check -x test

I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats.

As for the recomendations, here are my thoughts.

Fixing tests.seed obviously help to benefit from up-to-date check, I get the point about randomization, but this is a trade-off with expensive cost of resources

Yes, it is a tradeoff we're willing to take. Again - if somebody wants a locally fixed seed, they can do it. You'd be surprised how frequently those tests fail on boundary conditions in only certain environment combinations.

Use the standard Gradle wrapper

There is a reason why non-standard wrapper is used - please look up the relevant issue in Jira (source release shouldn't ship a binary artifact).

Setup local gradle.properties in Gradle home folder rather than having an automatic generation from gradle/generation/local-settings.gradle

There is a reason why gradle.properties is generated (it adjusts the defaults to the local machine). I wish gradle had a mechanism of tuning, say, max-workers dynamically, but I don't think it does.

Do not override Gradle daemon TTL to 15mn, this is way to short

If you run the build regularily switching VMs then background gradle daemons eat up all your memory. So no, I don't think it's too short.

Do not create / commit generated test files to src directory (frenchArticles.txt, Top50KWiki.utf8,

bq. CambridgeMA.utf8, Latin-dont-break-on-hyphens.rbbi)

You don't understand why they're there. One of these generated files requires 16GB memory and over 15 minutes on a decent server to generate. Even if you use the gradle cache, the first run on your old-ish laptop will kill your build with an OOM. Some of these resources require specific environments (like Linux toolchain). I don't think there is a mechanism in gradle which would allow only regenerating these resources if their source input triggers actually change.

I'll go through the changes you suggested and will cherry-pick some of the improvements, thank you.

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

First, thanks David for sharing some context, it is definitely helpful.

In regards with cache, you can rely only on local cache (in my case went up to 28Mo), which does not prevent the build from being run/tested on different systems. You can even wipe this periodically to keep it minimal.

About Gradle configuration, I'd rather have the local related settings in \~/.gradle/gradle.properties which takes precedence over the project's gradle.properties (see [here|[https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_configuration_properties])]

My bad for the files, I didn't get the point. I was probably surprised by having them colocated with some Java source files.  Anyway, you're right, any IO bound operation is most likely not to give you real benefit when cached.

Thanks for reviewing!

 

 

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion. Unless you have a bunch of corporate CI servers it'll only pollute your home with a gazillion of megabytes of data that will simply not be reused much. And we do want folks to run stuff in their own environments because this is a good regression test (different VMs, operating systems).

If somebody wants a local cache, they can enable it but it shouldn't be forced down their throats.

I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests!!!!

As provider of the Lucene build servers, every run should do all build steps again, because we want to test out JVM problems and this only works if the gradle build forgets everything.

I just ping @rmuir, because he also has a strong opinion on that.

So in short. Some of the changes in the PR looks fine, but everything that caches stuff on my local disk and serializes test results and checks thats should be avoided sorry!

Thanks for including my opinion.

Uwe

P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important. But For Lucene, if we have a zillion of build servers we want all of them redundantly run tests to find bugs in the JVMs. This is why we run the tests with different settings (compressed ops on/off, different Garbage collectors). That's what Policeman's Jenkins server is behind: Find bugs in Garbage collectors and different JVM version by running the build suite and tests 24/7. Also @mikemccand does this every night to monitor performance. Everything caching results would be a desaster!

If the build cache helps local developers, ok – but more important is to configure Input/Outputs correctly. I have a local machine with one operating system and dont need to cache results several days. It's only working myself.

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

If you run the build regularily switching VMs then background gradle daemons eat up all your memory.

So no, I don't think it's too short.

+1. On our build servers we disable the Gradle Daemon completely. I switch it on locally when I do quick incremental builds (try-and-error). But for running the whole build for releases or debugging Gradle-shitbugs, it s also off locally.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I will review the patch one-by-one (thanks for splitting the commits, it helps). Please give me some time as my schedule is pretty intense. It'd be awesome if you guys at gradle could take a closer look at some of the issues I outlined in my e-mail on the dev list [1] - don't know if you saw it. These are hard and require deeper knowledge of gradle internals (not to mention the will to perhaps change the implementation here or there).

[1] https://markmail.org/message/vjpfc2jwocroz7nd

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I'd rather have the local related settings in \~/.gradle/gradle.properties which takes precedence over the project's gradle.properties

The thing is: these are global. And many of these settings are project-specific. What works in one project wouldn't work in another. I found it irritating that something so easily solved in ant (include defaults, then local user properties from the project) is so difficult in gradle. I know I'm in no position to suggest anything but I would love to see a way of bootstraping with more than one project-local gradle*properties... or to have a way to compute some of the properties dynamically (so that machine settings can be fined-tuned to).

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I agree with Dawid and Uwe.

I will just add one thing: I do see one potential use-case for enabling the cache: regenerating those enormous jflex DFAs (from 'regenerate'). This seems contained enough, that we could possibly make it work efficiently and have all the inputs and outputs correct? This really is a case similar to javac, we are using a third party tool (jflex) to translate input grammar into .java output. The end result is actually quite small (e.g. 2MB result), but it requires gigabytes of memory and many minutes. Dawid has stuff in the build to "control" this already, so that the build fails if someone tries to edit a generated file directly.

But even so, I am wary of the current build cache. It doesn't allow me to easily bound the size: https://github.com/gradle/gradle/issues/3346 Will the cache behave correctly when it runs out of disk space? I would be happy to just configure a 10MB fixed loopback mount for this cache as a workaround, so that I generate the jflex DFA less often :)

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

so that I generate the jflex DFA less often

The build cache would have to fetch this from an external server so you'd need a network connection then. Besides - what causes it to be rerun? It should be skipped in the current build (unless you're really forcing it to run)?

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

I fully agree with this. PLEASE DO NOT ENABLE THE BUILD CACHE BY DEFAULT. As a developer I want and expect to take the build longer if I run "gradlew clean". I want "gradlew clean" to forget the build and then compile everything again and especially, I want the build to rerun all checks and tests!!!!

Regarding tests, they will be rerun by default with tests.neverUpToDate flag. You might be interested by the --rerun-tasks option which allow to ignore up-to-date checks.

P.S.: IMHO the Gradle build cache is a feature for streamlined projects with zillions of build servers to spare CPU resources mabye in organizational environments where the business logic is important.

We can differentiate between local cache and remote cache, this PR was not triggering any remote cache inclusion.

If the build cache helps local developers, ok – but more important is to configure Input/Outputs correctly. I have a local machine with one operating system and dont need to cache results several days. It's only working myself.

Yep this is the tricky part, configuring inputs and outputs accurately, but once you get there, it can be super interesting to not recompute something which was already computed. This comes with a price obviously, disk space taken but again can be super beneficial in some cases.

It'd be awesome if you guys at gradle could take a closer look at some of the issues I outlined in my e-mail on the dev list [1]

I will definitely relay that internally

Will the cache behave correctly when it runs out of disk space?

probably not

I would be happy to just configure a 10MB fixed loopback mount for this cache as a workaround, so that I generate the jflex DFA less often

There is no way to do that out of the box

 

 

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Sorry, maybe i wasn't clear. It is my understanding, that by default, it could cache 2MB in the local cache and it would persist across "gradle clean".

And yes, I know the large DFA task is skipped by default, but I imagined this would make it much less annoying, and we could potentially enable it? Sure, it doesn't fix the real minimization issue that causes it to take 20 minutes cpu + 10GB ram, but it would reduce the pain.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

> I would be happy to just configure a 10MB fixed loopback mount for this cache as a workaround, so that I generate the jflex DFA less often There is no way to do that out of the box

No, i mean I would do it myself, and configure /home/rmuir/.gradle/caches/ in /etc/fstab to be 10MB. So it would run gradle out of disk space if it tried to write any more than that.

I really don't want size-unbounded caches storing stuff or trashing my SSD. I keep all my caches on a short leash, it is pretty easy since most apps behave and store stuff under \~/.cache. So I already mount this as tmpfs with a size limit. And I pass flags such as chromium --disk-cache-size when apps have a way to explicitly bound the size.

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

Just to clarify, you can limit the TTL in cache, but not the whole cache size

https://docs.gradle.org/current/userguide/build_cache.html#sec:build_cache_configure

 

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

You might be interested by the --rerun-tasks option which allow to ignore up-to-date checks.

This reruns all tasks in the graph which is more of a pain than help (in majority of cases :)). To me the absolutely best feature of gradle lies in incremental tasks. When things are configured correctly this means incremental-check subsystem pretty much takes care of itself. I almost never have the need to run a full 'clean'.

Sorry, maybe i wasn't clear. It is my understanding, that by default, it could cache 2MB in the local cache and it would persist across "gradle clean".

It would still try to run this task on the first run when the input/output information isn't locally available (assuming no external cache is provided). This means it'd run at least once. To me this is a no-go. I really wish there was a mechanism for somehow persisting the state of up-to-date checks but there isn't.

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

The thing is: these are global. And many of these settings are project-specific. What works in one project wouldn't work in another. I found it irritating that something so easily solved in ant (include defaults, then local user properties from the project) is so difficult in gradle. I know I'm in no position to suggest anything but I would love to see a way of bootstraping with more than one project-local gradle*properties... or to have a way to compute some of the properties dynamically (so that machine settings can be fined-tuned to).

You might want to give a look at init scripts which allow to add some conditional logic

https://docs.gradle.org/current/userguide/init_scripts.html

 

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

support for project-controlled up-to-date mechanism (versioned artifact checksums) that is not bypassed on the first run. This is needed for generated resources - I didn't figure out how to do it other than with ugly hacks.

Did you explore upToDateWhen() method? 

 

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

test runner test case ordering optimization (load balancing between worker JVMs); currently the long-tail test case can slow down builds significantly.

Test distribution might be Gradle way to tackle that

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

I understand you're advocating for gradle cache and I think it's great but I don't think it should be the default setting - sorry, this is my honest opinion.

as a follow up to your comments, I just reverted the commit that enabled the local cache by default 

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Thanks Jerome. I'll go through that patch and will clean things up, no worries. It's just in the queue of things to do.

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

All good, just trying to make things smoother for you guys 👍

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Hi Jerome. I reviewed the PR again - added changes entry, a non-enabled option to the default gradle.properties. I also corrected unnecessary exclusions in validateSourcePatterns (.gradle and .idea are at the root level only, so they only need to be excluded for the root project).

Finally, after some deliberation, I decided to not include changes to test-related tasks (ecjLint and test). You know our position on the subject of caching test results and the changes you added (while informative about how gradle handles such things) add a layer of additional complexity that I don't think is needed. Maybe it'll change in the future, who knows - if so, your PR stays and can be a source of code/ inspiration.

I'd also like to correct the renderJavadoc task - I'd like to keep the map for offline links as it makes it much simpler to configure and maybe add new linked javadocs later. Can this map be made cacheable by exposing a converting Input getter method with a list of classes marked Nested and two fields (string and a file)? So that we can keep using the map but provide more semantics to gradle?

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I've just tried this and it seems to work just fine. I'd appreciate if you could have a second look at the modified PR at https://github.com/apache/lucene/pull/421 and I think it's ready to go.

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

Just reviewed and approved the second PR 👍

Nice use of the @Nested to keep the actual type and be able to still benefit from caching

 

Don't know why I had some .idea / .gradle messed up in some different locations, nvm, thanks for fixing!

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Thanks Jerome Prinet!

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1a38cac68e6b625f9316e08bf28f75a7b6953044 in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=1a38cac

LUCENE-10195: add commented-out org.gradle.caching=true to the generated local settings.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit a53d633bd9294b8e736c26148f2b5efa3ec5fe48 in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=a53d633

LUCENE-10195: LUCENE-10195: Add gradle cache option and make some tasks cacheable

asfimport commented 2 years ago

Jerome Prinet (migrated from JIRA)

Thanks to you @dweiss!

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 9.0.0 release