apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.65k stars 1.03k forks source link

Convert build to work with Git rather than SVN. [LUCENE-6938] #7996

Closed asfimport closed 8 years ago

asfimport commented 8 years ago

We assume an SVN checkout in parts of our build and will need to move to assuming a Git checkout.

Patches against https://github.com/dweiss/lucene-solr-svn2git from #7991.


Migrated from LUCENE-6938 by Mark Miller (@markrmiller), resolved Jan 24 2016 Attachments: LUCENE-6938.patch (versions: 4), LUCENE-6938-1.patch, LUCENE-6938-wc-checker.patch (versions: 2) Linked issues:

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, the current build relies on SVN at 3 places:

Jenkins builds need to be updated, but thats easy. Policeman is already prepared to do git checkouts (using jgit, see above).

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Thanks Uwe!

This is not critical and can be fixed later.

+1.

I'll open separate issue if needed.

Yeah, let's see how long this takes vs a jgit version. If we just finish this issue real quick, we might as well spin it off into it's own issue.

but somehow this makes no sense, as the commit hashes cannot be sorted and don't have a "version like" character

But it does tell you how to get to the code that created that JAR still, right?

The source tar gz/zip files use svn export. We have to fix this, otherwise we cannot release and test.

Yeah, this seems like the key issue to address

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Here is a patch with some early exploration. I have an alternate for the svn export I think, and have done some initial renaming, cleaning up.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Looks good for now. As I see, you removed the svnversion from JAR files - like suggested. We can investigate further, if we really need the git commit hash in every JAR file. to me this was always a large slowdown on windows where creating a new process costs much time.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

That export runs so much faster than svn export.

Overall, precommit dropped from like over 7 minutes to 4 and a half for me. Unless it's the SVN stuff, but that seemed to be relatively quick based on the output. It will be quick with jgit regardless I'm sure.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

over 7 minutes

I guess that was a bit of a long run. I've since seen it take 5 and a half minutes, while git has been pretty consistent at 4 and a half minutes. Probably is just the difference in export speed.

asfimport commented 8 years ago

Upayavira (@upayavira) (migrated from JIRA)

Presumably the reason for building the tarball from an export is to avoid including uncommitted changes. Could we achieve the same end by just confirming that the git checkout is clean and not proceeding if there are any changes? Like so:

if [ -z "$(git status --porcelain)" ]; then 
  # Working directory clean
else 
  # Uncommitted changes
fi

(courtesy of http://unix.stackexchange.com/questions/155046/determine-if-git-working-directory-is-clean-from-a-script)

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Presumably the reason for building the tarball from an export is to avoid including uncommitted changes

I'm not sure - we have checks for that done in another way in another place. I think it was also to avoid things like the .svn folders and what not. i.e. what an svn export is for.

Anyway, given this operation is so fast with the git method, I like keeping the old macro and behavior initially. Seems simpler and safer to mimic old behavior while we make the migration and update the release doc, etc.

asfimport commented 8 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Presumably the reason for building the tarball from an export is to avoid including uncommitted changes

I'm not sure - we have checks for that done in another way in another place. I think it was also to avoid things like the .svn folders and what not.

Also it's to avoid including locally-built ignored things in tarballs. This will remain an issue with git, I think.

asfimport commented 8 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

FWIW, building the source release from svn export was put in place as a result of problems building the 3.1 release (previously there were fragile inclusion/exclusions rules, the rough equivalent of which are preserved as lucene.local.src.package.patterns for the package-local-src-tgz target). See http://markmail.org/message/nfon2anpgzdja2st and #4047.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Exactly. We don't want to have anything ignored or other leftovers in it. Basically we want to tar what "ant clean clean-jars" leaves back (or should). Or better: the state of a fresh checkout without any additional files (not even ignroed ones) and no special files like .svn or .git.

The reason why git is faster than svn on exporting: it does the whole export stuff from your local GIT database wthout networks. Because you have the whole repository local already.

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Hi. Catching up with what's been said, here's my opinion.

1) I didn't go into the specific of what the scripts were doing to get a "pristine" copy of a built, but with git you can do git-archive to get a tarball without any intermediate filesystem index. This has advantages for Windows (permissions are stored properly) and for speed (much faster on slower filesystems). Perhaps the scripts could be improved and do a two-phase check: git stat to verify the current checkout isn't dirty (locally ignored files remain ignored) and then create a tarball, from which all the tests, etc. would be executed in any follow-up steps.

https://git-scm.com/docs/git-archive

2) I strongly opt for keeping the commit's md5 inside build artefacts. This has helped me enormously in the past a few times. These hashes are not linear, correct, but they are even better at locating a particular commit the build was executed against, be it a branch or whatever. We personally use two git markers – the commit hash and a more human-friendly last-tag + dirty flag. We call jgit from ant, but here's the beanshell script that collects the required properties:

      import org.eclipse.jgit.api.*;
      import org.eclipse.jgit.lib.*;
      import org.eclipse.jgit.storage.file.*;

      Repository repository = new FileRepositoryBuilder()
        .findGitDir(project.getBaseDir())
        .build();

      Git git = new Git(repository);
      String revLine =
          git.log().setMaxCount(1).call().iterator().next().name() +
          (git.status().call().isClean() ? "" : "-dirty");

      project.setNewProperty("product.gitrev", revLine);
asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

you can do git-archive to get a tarball without any intermediate filesystem index.

Yeah, that is the first thing I found when looking for a git export. I tried to keep things as similar to they were as possible though - and I'd also like to keep as much logic (like the compressing) out of exec and in ant as possible.

I strongly opt for keeping the commit's md5 inside build artefacts.

I have no problem with including the hash myself.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

We can investigate further, if we really need the git commit hash in every JAR file. to me this was always a large slowdown on windows where creating a new process costs much time.

Wow :) I bet there is a good chance this is much faster with Git. Otherwise, there must be some way to just get the checkout sha once and use it for every jar?

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Wow I bet there is a good chance this is much faster with Git. Otherwise, there must be some way to just get the checkout sha once and use it for every jar?

Yes that would work. I was about to do that on svn already. The trick is to just populate the property once and pass it down to sub-builds (using the patternset of properties to pass down). We then just need a task that populates the property if it doest not yet exists (unless="property")

I think we should just replace the old SVN revision in JAR files by the sha1. It should be easy to get it with a single "git" command (no idea how: I am still not firm in using Git's CLI; I always use TortoiseGit, because the command line of Git is the most confusing an user-unfriendly thing I have ever seen).

FYI: I would not call "jgit" to populate the property at the moment - if we cache the result its fine - because this will cause permgen errors in Java 7 (the well-known Ant Classloader bug). Otherwise I would have implemented the same for SVN already :-) (I have a patch here for SVN similar to Dawid's code, but this breaks the whole build after a complete build with many JAR files).

asfimport commented 8 years ago

Paul Elschot (migrated from JIRA)

there must be some way to just get the checkout sha once

This will work on the current branch:

git log --format="%H" -n 1

But also this, for example for the trunk branch:

cat .git/refs/heads/trunk

This is why branching in git it so cheap, a branch is no more than a file with a sha1.

the command line of Git is the most confusing an user-unfriendly thing

Indeed, and this makes the learning curve steeper than it would need to be.

asfimport commented 8 years ago

Dennis Gove (@dennisgove) (migrated from JIRA)

You can get the current sha1 with the command

$> git rev-parse HEAD

And you can replace HEAD with the name of a branch/tag to get the sha1 of that. See $> git help rev-parse for all the options

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Can we specify the commands (build/ precommit checks) that need to "work" with a git clone so that we can orderly go through them and know where we are with the migration process? It'd be good to have it done and then vote/ move on with the development to git. My candidates would be:

Then there are follow-ups:

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

We don't need to vote yet - that only happens when consensus fails or someone wants to force something. We can warn the dev list again to make sure everyone is caught up, but no need to force a vote unless someone comes out against. There is a very visible discussion and a few JIRA issues that have been in progress for a long time now. Once we are ready to go, we can sum things up in a new dev thread.

I think in terms of what needs to be covered here, Uwe has detailed it pretty well. We want all the targets to work really - or to understand why any target does not work. We can wait for Uwe to create a new git validator though - all targets still work without that. 'svn' does not really have a very deep imprint in our build targets.

I think the main thing left to do in this issue is put the git hash in efficiently.

Some other things people are concerned about can get further JIRA issues, but I imagine a lot of that (such as python scripts) can be updated as used / needed by those that use them.

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

> I think the main thing left to do in this issue is put the git hash in efficiently.

I think this is an improvement, not a requirement? Don't we call SVN multiple times already? Other than that I agree with you – get the baseline targets working, clean up everything that doesn't work after the transition.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

I think this is an improvement, not a requirement?

I think because we had this feature with svn and there is no consensus about dropping it and it affects releases, we want it before the move. I'm sure it will be simple to add.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

there must be some way to just get the checkout sha once

The key word is once. Yes, of course we can get the sha the same way as we can get an svn version :) Uwe's concern is how many times we execute a program to do it. Our ant scripts init 8 billion times per target.

I'll look into trying to exec a minimal number of times.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Doesn't look easy to share any state between multiple inits.

I don't even know if doing it at the top level appears any better than per jar. It's still a ton of calls per run.

We can simply allow it to be disabled via build.properties if it's an issue for some Windows devs.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

I think this is an improvement, not a requirement?

I think I slightly misunderstood this the first time. You meant making it more efficient for windows was not a requirement?

In that case I agree, though I figured if it was easy, we should just do it. It does not look so easy though. So I suggest a switch to turn it off in build.properties. But right, I don't think it's a requirement that we make it more efficient, just that we keep an id in the jars.

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I think I slightly misunderstood this the first time. You meant making it more efficient for windows was not a requirement?

Yes, exactly. I work on Windows so it also affects me, but I don't think it's critical.

It does not look so easy though

I think it's doable, but far from convenient. It's a similar situation as to what happens with "checking whether any tests executed" – you want a property or a marked passed down to sub-builds... it's a pain to maintain. Perhaps we should look at the core of the problem and somehow fix it in Ant itself...

asfimport commented 8 years ago

Upayavira (@upayavira) (migrated from JIRA)

How about the main build gets the git hash and writes it to a temporary property file. All other build phases can then just read that file and we are done. Am I missing something?

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

How about the main build gets the git hash and writes it to a temporary property file. All other build phases can then just read that file and we are done. Am I missing something?

Thats not good, because on git update it would not update that file unless you do ant clean. The solution proposed before is easy, I will implement it later, no worries. We have the infrastructure:

No worries, I will take care! For now just use the approach we had with Subversion.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I will open serparate issue once we moved to Git and fix that, together with the check-svn-working-copy-task!

So lets do the move now!

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Okay, here is what I think is the minimum patch that replicates what we had. Let's do that for this issue and open new issues for any improvements. That way we won't hold up the conversion at all and we can try and keep some momentum.

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

https://github.com/markrmiller/lucene-solr-svn2git/commit/c587241d35f3dc641a2de26eff3ba2dc2f6eca59

asfimport commented 8 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Mark, I applied your patch to master (temporary migrated repo at git@github.com:dweiss/lucene-solr-svn2git.git). ant precommit worked for me without any problems. I could not apply it cleanly to branch_5x though – I didn't look closely, just proceeded with the import, I'm sure we can figure it out later.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

In trunk, the smoke tester on Jenkins did not pass: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-trunk/395/console

To me it looks like some magic we do with the src-export folder did not fully work, so missing some files:

package-tgz-src:
    [mkdir] Created dir: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/build/src-export/docs/changes
      [get] Getting: https://issues.apache.org/jira/rest/api/2/project/LUCENE
      [get] To: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/build/src-export/docs/changes/jiraVersionList.json
     [exec] Failed to open /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/build/src-export/CHANGES.txt
     [exec] Use of uninitialized value $first_relid_regex in substitution (s///) at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 223.
     [exec] Use of uninitialized value $second_relid_regex in substitution (s///) at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 225.
     [exec] Use of uninitialized value $first_relid_regex in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 226.
     [exec] Use of uninitialized value $second_relid_regex in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 226.
     [exec] Use of uninitialized value $title in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
     [exec] Use of uninitialized value $first_relid in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
     [exec] Use of uninitialized value $second_relid in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
     [exec] Use of uninitialized value $first_relid in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
     [exec] Use of uninitialized value $second_relid in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
     [exec] Use of uninitialized value $title in concatenation (.) or string at /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/site/changes/changes2html.pl line 231.
   [delete] Deleting: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/build/src-export/docs/changes/jiraVersionList.json
     [copy] Copying 3 files to /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/build/src-export/docs/changes
      [tar] Building tar: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/dist/lucene-6.0.0-src.tgz
     [echo] Building checksums for '/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-trunk/lucene/dist/lucene-6.0.0-src.tgz'

Not sure what is broken there.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here is the patch to fix the Lucene src.tgz. The reason was simple.

src.export.dir contains now root folder, but previously it was only the lucene/ subfolder. The logic in the macro and git's behaviour did not reflect this. Te quick workaround is to define additional property for Lucene's build.xml and append this while tarring and running scripts.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Updated patch also disabling svnkit stuff in root.

Will commit in a moment to make smoker work

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Final version, previous one was not as effective as current one.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I committed everything. @dweiss, can you merge my recent commit to 5.x, too?

The Jenkins Jobs for 5.x are not yet enabled, I am waiting for backport.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

It looks like JIRA does not yet reports GIT commits, we should fix that.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I cherry-picked the 3 commit (Mark/Dawids and 2 of mine). Worked like a charm with the GUI of Tortoise. The conflicts Dawid found was just java8 vs. java7 strings.

I'll now reenable 5.x builds on jenkins

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

As we currently have no auto-comments from the Git Bot:

asfimport commented 8 years ago

Mark Miller (@markrmiller) (migrated from JIRA)

Thanks for taking this up Uwe! I'm away for the weekend.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, here is the former "ant check-svn-working-copy" now ritten for git. Its running almost the same checks:

As GIT has no file properties, we don't do property checks like EOL-style or MIME-type.

The usage is same, it runs by ant validate/precommit. The target name changes a bit, I removed the "svn".

I will commit this a bit later. I do some tests with different (non-)working copies.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Small update.

asfimport commented 8 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks Mark & Dawid! I close this issue, build seems to be fully converted to GIT.

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 424a647af4d093915108221bcd4390989303b426 in lucene-solr's branch refs/heads/master from @uschindler https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=424a647

8052, LUCENE-6938: Add branch change trigger to common-build.xml to keep sane build on GIT branch change

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 0ef36fcdd107084a2ac3156943f01eb5f7dd9c74 in lucene-solr's branch refs/heads/branch_5x from @uschindler https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0ef36fc

8052, LUCENE-6938: Add branch change trigger to common-build.xml to keep sane build on GIT branch change

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 7f9506ca82032804f2354fef71201366fcbf9932 in lucene-solr's branch refs/heads/branch_5_4 from @dweiss https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7f9506c

LUCENE-6938: Convert build to work with Git rather than SVN. (Mark Miller via Dawid Weiss).

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit f4b228b34174e87b1ed43e330b500d8b795604ca in lucene-solr's branch refs/heads/branch_5_4 from @uschindler https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f4b228b

LUCENE-6938: Fix Lucene's src.tgz file; remove svnkit stuff

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 83112977e8fa66615d23e57697b2743052c71098 in lucene-solr's branch refs/heads/branch_5_4 from @uschindler https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8311297

LUCENE-6938: fix typo, sorry

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 016b26675efbd25b9907115b400bacb55b840af3 in lucene-solr's branch refs/heads/branch_5_4 from @sarowe https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=016b266

LUCENE-6938: Maven build: Switch SCM descriptors from svn to git; buildnumber-maven-plugin's buildNumberPropertyName property (used in Maven-built artifact manifests) renamed from svn.revision to checkoutid; removed Subversion-specific stuff from README.maven

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit d4a8bbbf2b1effc2f166530fcd4720127eafc9a9 in lucene-solr's branch refs/heads/branch_5_4 from @uschindler https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d4a8bbb

LUCENE-6938: Improve output of Git Hash if no GIT available or no GIT checkout (this restores previous behaviour)