Is mvnd "safe" for parallel builds where "mvn" alone is not?

jimklimov commented 1 year ago

Related context:

Generally, for using mvnd on the CI:

If you expect faster builds thanks to building Maven modules in parallel, then you may consider using stock Maven's -T option. But be warned that it may lead to issues caused by the fact that stock Maven (as of 3.8.2) does not prevent concurrent writes to the same file in the local Maven repo. BTW mvnd 0.6.0 suffers from this problem as well. <...>

Originally posted by @ppalaga in https://github.com/apache/maven-mvnd/issues/498#issuecomment-943760507

My question is rather this: we have CI jobs running mvn tests under the same user account on the same agent - whether for fully independent jobs, or for different parallel stages of the same job (spawning a burst of tests for different components etc.)

This approach did hiccup with unexpected errors which we assume to be the cross-maven corruption of the same local repository, when downloads happened to run simultaneously and fetched the same files. We theorized of a few possible solutions, no idea if any are viable though (hence the question):

convert to use mvnd, hoping it would take care of parallelly-requested downloads to same local repo in a sane manner (even if sequentializing them internally, etc.) and transparently for mvn "client" calls;
use locking (e.g. Jenkins lockable resources) to ensure sequential runs, and separate the maven operations to first call an mvn validate before each longer build/test (assuming this would fetch all needed files) as the sequential operation, then do actual mvn test/compile/package/etc. as the parallel operations which rely on files fetched safely before.
- With this I'm not sure if a later downloading session from another maven WOULD NOT endeavour to clean up the local repository from files it does not need for the current build this other maven is handling now.
Use separate maven local repos for each independent operation. This raises a few concerns however:
- Seems to be wasteful on disk space, and on losing the benefits of having pre-fetched files on the workers -- especially if(?) we can not layer several local repos, e.g. use a separate location for "our" components, but a shared one for third-party dependencies.
- Internet lore https://mkyong.com/maven/where-is-maven-local-repository/ or https://stackoverflow.com/questions/28767088/maven-local-repository-setting-being-overridden hints how...

Any insights would be most welcome :)

cstamas commented 1 year ago

First, please forget the "built in" parallel builder, use instead this https://github.com/takari/takari-smart-builder (same is used by mvnd).

Second, did you try maven 3.9.x (preferably latest) version? You could use file locking for start (if local repository is on local FS)

jimklimov commented 1 year ago

Thanks for suggestions, checked that the workers used maven 3.8.6... So change to 3.9.x could just fix the situation while keeping all those build calls independent as they are now?

cstamas commented 1 year ago

Maven 3.9 introduced "locks" for local repository, trying to solve exactly that: shared access to local repository from multiple processes... So best would be to try it out (hopefully using some sound FS like ext4 or alike, no windows in picture).

jimklimov commented 1 year ago

Oh that funny moment when the huge internet looks like a small village: https://www.mail-archive.com/users@maven.apache.org/msg144072.html

Configuration should be as easy as setting “aether.syncContext.named.factory” to “file-lock”

So looking for some best way among several possibilities, to pass aether.syncContext.named.factory=file-lock from CI to maven and not confuse possible other (pre 3.9.x) tool versions along the way :)

cstamas commented 1 year ago

create (and check in into SCM) a file in project like this:

.mvn/maven.config

with contents

-Daether.syncContext.named.factory=file-lock
-Daether.syncContext.named.nameMapper=file-gav

PS: plz double check this above, is from top of my head

https://maven.apache.org/configure.html#mvn-maven-config-file https://maven.apache.org/resolver/configuration.html

jimklimov commented 1 year ago

Thanks for the options, my first shot missed the "file-gav" part ;)

Putting them into SCM as part of the components' source seems a bit like overkill... they should be buildable anywhere (and maybe with different environmental settings), right?

For posterity and my own back-tracking, I'll be exploring the MAVEN_OPTS envvar instead for now, so each Jenkins worker might set what is relevant there...

jimklimov commented 1 year ago

While at it, I am trying to wrap my head around the https://maven.apache.org/resolver/local-repository.html#split-local-repository feature. Is there some trick that would allow several CI builds to share downloaded third-party artifacts but let them store separately and use without conflict some artifacts from designated "our" namespaces?

I saw suggestions about e.g. -Daether.enhancedLocalRepository.localPrefix=$PROJECT/$BRANCH - but it seems more related to where mvn install would land. Things like mvn test happen inside the build workspace and do not write the built code/test binaries into the local-repo on their own accord, right?

jimklimov commented 1 year ago

And circling back to this repository's topic, with some new knowledge in mind - does current mvnd benefit from the new concurrency-safe resolver like maven 3.9.x does?

A large part of my team members' question boils down to whether we can replace the original maven by mvnd+mvn frontend just at a finger-snap, by putting different tools into the PATH and not changing much more in the pipelines etc. -- and if this would bring some efficiency benefits?

Originally this idea to use mvnd came up with brainstorming a one-off case of SBOM processing script that does a lot of analytics over mvn help:effective-pom (each call takes some 3 seconds to generate an XML, which adds up to evil run-times for hundreds of components in a deliverable bundle).

cstamas commented 1 year ago

Split local repository is new feature, but you have to be aware that in Maven 3 land not all plugins "play nice" with it, see https://issues.apache.org/jira/browse/MNG-7706 In short, if using Maven 3.9.x and you do not see "plugin validation warnings" (see https://maven.apache.org/guides/plugins/validation/index.html), then you should be pretty much okay (but still no 100% guarantee). Best is to try locally and see first (so "lab testing" is what I'd recommend).

Moreover, as split local repository does know to be a bit "mind boggling", I'd really even more recommend to play with it locally (on dev workstation), and when all in place, and if all OK, apply that to CI.

mvnd := mvn 3.9.x (or 4, edition dep) + resolver 1.9.x + smart builder + concurrent logging + resident daemon. So in short, "yes, mvnd knows all what mvn 3.9.x plus much more". In mvnd the file locking is enabled by default (due resident daemon processes sharing same local repository).

Split repo for that use case is next on my roadmap, but gonna happen only in Resolver 2.0 (so Maven 4 final), not in Maven 3.9.x lifecycle. Currently, "split" can only split based as documented: origin remote repo and cached vs install. Current goal of it was "branched development" (one local repo shared with several maven processes building same project but different branches of it).

apache / maven-mvnd

Is mvnd "safe" for parallel builds where "mvn" alone is not? #896