eclipse-jgit / jgit

JGit, the Java implementation of git
https://www.eclipse.org/jgit/
Other
92 stars 31 forks source link

No forking process when building a Repository object #24

Open xenoterracide opened 5 months ago

xenoterracide commented 5 months ago

Description

Gradle with the configuration cache enabled, needs to not fork processes. Obviously git is very popular, and integrated with versioning. Unfortunately it seems that jgit, in some cases, and possibly unnecessarily, forks git itself and makes it incompatible with the configuration cache.

2 problems were found storing the configuration cache.
- Plugin 'com.xenoterracide.gradle.sem-ver': external process started '/usr/bin/git --version'
  See https://docs.gradle.org/8.6/userguide/configuration_cache.html#config_cache:requirements:external_processes
- Plugin 'com.xenoterracide.gradle.sem-ver': external process started '/usr/bin/git config --system --show-origin --list -z'
  See https://docs.gradle.org/8.6/userguide/configuration_cache.html#config_cache:requirements:external_processes

I'm currently not certain if it deals with both forks, but perhaps the environment variable Constants.GIT_CONFIG_NOSYSTEM_KEY could also be set programatically on the builder.

Alternatives considered

There's this, but I don't know if it's sufficient https://stackoverflow.com/a/59110721/206466

I also believe there may be ways to workaround it with gradle.

Additional context

I might be willing to take a stab at this, with feedback. Here's what spotless is doing as a workaround

https://github.com/diffplug/spotless/blob/main/plugin-gradle/src/main/java/com/diffplug/gradle/spotless/GitRatchetGradle.java#L35

msohn commented 5 months ago

I think building a Repository object in JGit does neither fork another process nor use git. Can you present any proof that this is not true ?

xenoterracide commented 5 months ago

I mean, I would hope that gradle's error is enough, you can check out this branch, it's build fails with the error.

https://github.com/xenoterracide/gradle-semver/tree/external/bug/jgit-24

This is the line that causes the failure https://github.com/xenoterracide/gradle-semver/blob/external/bug/jgit-24/src/main/java/com/xenoterracide/gradle/semver/GitVersionProvider.java#L27

This is the test you can run to reproduce the failure. configurationCache() is the actual test that fails, unfortunately gradle has chosen not to put effort into running the debugger with --configuration-cache and the debugger at the same time. So if you want to break point you'll need to use the debug() test.

https://github.com/xenoterracide/gradle-semver/blob/external/bug/jgit-24/src/test/java/com/xenoterracide/gradle/semver/SemVerPluginIntegrationTest.java

git --version is called here https://github.com/eclipse-jgit/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/util/FS.java#L1557

it's called, through a level or two, via

https://github.com/eclipse-jgit/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/util/SystemReader.java#L107

setting the environment variable isn't an option here

I'm not certain what calls openSystemConfig, only that it is called at some point when using Git.open and FileRepositoryBuilder...build().

if this isn't evidence/research enough, what would you like me to provide?

Vampire commented 5 months ago

@msohn believe it or not, it does. ;-) Iirc, not exactly when building a Repository object, but in some static initializer that happens latest then. JGit searches for the git executable on the path and then executes it twice, one time with --version and one time to get the location of the system configuration file. On macOS it might even further start a bash to call which git and an xcode-select. It's the implementation of discoverGitExe() in FS_POSIX.java and FS_Win32.java that does this. It can be disabled by setting the mentioned GIT_CONFIG_NOSYSTEM environment variable, which is only an option if you actually control the process, or by setting in time a custom SystemReader instance that has the same effect by overwriting the openSystemConfig with a custom version that does not call out to the git executable. This is a very long known "issue" with JGit.

msohn commented 5 months ago

Ok, now I understand what you mean. I was misled by your sentence stating that JGit forks git which didn't make sense for me.

SystemReader.Default does that to ensure JGit by default looks at the same configuration files which git is using. If you want a different behavior you can follow the approach Gerrit uses to load the system config from a custom path which is located under the gerrit site's etc directory. This is achieved by a custom SystemReader delegating all methods to the default implementation except openSystemConfig() which it overrides to implement the custom behavior.

If you don't need the system config you may e.g. return a dummy config from this method:

return new FileBasedConfig(parent, null, fs) {
    @Override
    public void load() {}

    @Override
    public boolean isOutdated() {
        return false;
    }
};

If it helps to solve your issue we may consider to move DelegateSystemReader to JGit.

See https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/SystemReaderInstaller.java https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/util/git/DelegateSystemReader.java

Would that help ?

Vampire commented 5 months ago

We - or at least I - know why JGit does it. And we are also aware of the two ways to mitigate the problem as I just described that exact approach in my last comment. I think what Caleb is asking for is a built-in easy to use way to disable the git.exe calling that does not require setting an environment variable which you cannot always control and does not require to subclass a method with several abstract methods that you all have to delegate to the default system reader.

msohn commented 5 months ago

Pushed SystemReader.Delegate for review https://eclipse.gerrithub.io/c/eclipse-jgit/jgit/+/1177071

xenoterracide commented 5 months ago

That's a massive improvement.

However, what I was hoping to be able to write is something like

    var builder = new FileRepositoryBuilder()
      .withSystemConfig(false)
      .readEnvironment()
      .setMustExist(true)
      .findGitDir(this.projectDirectory);

    try (var repo = builder.build()) {

The reality is that I don't think this would actually fix the big concern that worries me, which is the the idea of two different classes delegating over the same static singleton. Gradle is unlikely to have this problem because of how it separates two plugins. the example has the same behavior in both cases so it wouldn't matter, but imagine they didn't.

package com.xenoterracide;

import org.eclipse.jgit.util.SystemReader;
import org.gradle.api.Plugin;
import org.gradle.api.Project;

public class SemVerPlugin implements Plugin<Project> {
  static {
    preventJGitFromCallingExecutables();
  }

  static void preventJGitFromCallingExecutables() {
    SystemReader reader = SystemReader.getInstance();
    SystemReader.setInstance(
      new DelegatingSystemReader(reader) {
        @Override
        public String getenv(String variable) {
          return "PATH".equals(variable) ? "" : super.getenv(variable);
        }
      }
    );
  }
package com.myorg;

import org.eclipse.jgit.util.SystemReader;
import org.gradle.api.Plugin;
import org.gradle.api.Project;

public class SemVerPlugin implements Plugin<Project> {
  static {
    preventJGitFromCallingExecutables();
  }

  static void preventJGitFromCallingExecutables() {
    SystemReader reader = SystemReader.getInstance();
    SystemReader.setInstance(
      new DelegatingSystemReader(reader) {
        @Override
        public String getenv(String variable) {
          return "PATH".equals(variable) ? "" : super.getenv(variable);
        }
      }
    );
  }

Internally having a check in SystemReader that is, also feels safer in support of the pattern. It doesn't solve the aforementioned problem though, because SystemReader.getInstance is called 86 times and is a mutable singleton.

            if (this.gitSystemConfig && StringUtils
                    .isEmptyOrNull(getenv(Constants.GIT_CONFIG_NOSYSTEM_KEY))) {
                File configFile = fs.getGitSystemConfig();
                if (configFile != null) {
                    return new FileBasedConfig(parent, configFile, fs);
                }
            }

Sorry, it feels like what I'm saying is a little chaotic. I'm just concerned with longterm support and libraries having class load order issues. I don't think the latter is easily fixable and would probably require a v7 to get away from it.

Vampire commented 5 months ago

Gradle is unlikely to have this problem

Why?

because of how it separates two plugins.

You mean like ... not? Plugins are not isolated. And using static state in a Gradle plugin is actually highly problematic. Spotbugs for example uses static state and the Spotbugs Gradle plugin used to call Spotbugs in-process, which caused even builds of totally different projects that happened to run in the same daemon one after another influencing each other in sometimes subtle and sometimes not so subtle ways. The plugin was later fixed to do the actual Spotbugs invocations each in a separate process to mitigate this.

xenoterracide commented 5 months ago

no, but I stand corrected. I thought gradle was doing some classloader magic or something under the hood to avoid jar hell from plugins. My mistake. This of course, arguably, makes my concern about multiple plugins using jgit, and this mutable singleton instance more concerning. The right answer perhaps being to shade jgit (unless resolving the singleton is a possibility).

Essentially, as written with Delegate's, it's going to be russian nesting dolls of a delegate, delegating to a delegate, as each attempts to solve the same problem, possibly in different ways. We then all pray that no one is doing something that breaks it.

msohn commented 4 months ago

That's a massive improvement.

:-)

However, what I was hoping to be able to write is something like

    var builder = new FileRepositoryBuilder()
      .withSystemConfig(false)
      .readEnvironment()
      .setMustExist(true)
      .findGitDir(this.projectDirectory);

    try (var repo = builder.build()) {

The reality is that I don't think this would actually fix the big concern that worries me, which is the the idea of two different classes delegating over the same static singleton. Gradle is unlikely to have this problem because of how it separates two plugins. the example has the same behavior in both cases so it wouldn't matter, but imagine they didn't.

I think the decision if you want to use the git system config or not should be global per process using JGit. Which semantics would you end up with if some part of the code running in that process would respect the system level config and another part would not ?

How to configure this consistently e.g. using a custom SystemReader for one or multiple Gradle plugins seems like a problem which needs to be solved on the application side (here Gradle).

xenoterracide commented 4 months ago

I think the decision if you want to use the git system config or not should be global per process using JGit.

Which semantics would you end up with if some part of the code running in that process would respect the system level config and another part would not ?

Because java doesn't have a good way of segmenting libraries (does anything?) where mutable contants can be modified by 2 different libraries, and multiple versions of the same library have no real guaranteed load order. The best option is to stop having a mutable global singleton. Instead preferring the below code, but working in a way that a global singleton is never used. Likely that means passing around the system reader.

The other thing that would make a setter/wither appealing is knowing that this isn't a "hack" but an officially approved pattern to avoid c git. It feels like a hack that might break in the future.

Unintuitively, this doesn't work, btw. Without setting the global instance, this fails all the same (an asside, I don't know why readEnvironment can even take it if it can be ignored).

     var builder = new FileRepositoryBuilder()
        .readEnvironment(
          new DelegatingSystemReader(
            new DelegatingSystemReader(SystemReader.getInstance()) {
              @Override
              public String getenv(String variable) {
                if ("PATH".equals(variable)) {
                  return "";
                } else {
                  return super.getenv(variable);
                }
              }
            }
          )
        )
        .setMustExist(true)
        .findGitDir(this.getParameters().getProjectDirectory().get().getAsFile());

application side (here Gradle).

Gradle isn't responsible for fixing this, that's the same as saying Java is... this is just an issue with having multiple people mutate the same constant within java. That's why it's an antipattern to have mutable static fields.

The only option without jgit migrating away from the antipattern is to shade the library and for me to hope that it's recognized that this will always be needed and that it's not "hacky".