apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

Revisit smoketester for 9.0 build [LUCENE-9997] #11036

Closed asfimport closed 3 years ago

asfimport commented 3 years ago

Currently we have a (great) dev-tools/scripts/smokeTester.py that will perform automated tests against a release.

This was developed with the ant build process in mind.

This issue is just about considering the automated checks we do here, maybe some of them can be done efficiently in the gradle build in earlier places: this would be a large improvement!

Obviously some of them (e.g. GPG release key verifications) are really specific to the artifacts in question. These are most important to release verification, as that is actually the only place we can check it.

Any other checks (and I do tend to think, this checker should try to be thorough, invoking gradle etc), should be stuff we regularly test in PRs/nightly/builds.


Migrated from LUCENE-9997 by Robert Muir (@rmuir), resolved Oct 19 2021 Parent: #10415 Attachments: image-2021-10-12-12-47-11-480.png, image-2021-10-12-12-48-15-373.png Linked issues:

Pull requests: https://github.com/apache/lucene/pull/355

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Perhaps this is the time to pick this up, before the 9.0 release kicks off.

The gradle build kicked off from buildAndPushRelease script still does not assemble source release artifacts, so this depends on that (#10527).

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

I have started on the smoketester in https://github.com/apache/lucene/pull/355 ripping out solr-specifics and some formatting stuff.

I'm sure there are more improvements and adjustments related to gradle that can be done in followup PRs.

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Just for the record - I agree with Robert here that some of the stuff the smoke tester does can be shifted to regular Java integration tests (working directly on top of the dependency on the release archive). This is technically a trivial thing to set up but a larger effort to port some of the existing Python checks.

I'd postpone changes to the smoke tester until after 9.0 is out, it seems too fragile to do it now?

asfimport commented 3 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Some history: originally we had no smoketester. Every release had a lot of ppl hand-checking various random stuff, to ensure the artifacts were correct. It made it very difficult to release. None of these problems (e.g. build system bugs) would surface until release time, no regression testing.

We added the smoketester, if an issue came up in the release VOTE, we'd fix the tester to detect it too. At least it gave the RM a chance to "locally test stuff out" more efficiently and prevent the same issues from popping up every VOTE.

We added a "ant nightly-smoke" target, which issued a release every night, and then ran the smoketester on it, and made it a nightly job. The idea is to catch problem even sooner. The problem with this, is that nobody paid attention to the builds@ mailing list, because solr tests fail all the time. This problem remains to this day, even though its a separate PMC.

The problem impacts more than just releases, but even simple commits. I cannot with confidence simply merge a good-looking PR on my mobile phone, because our PRs don't simply run the tests for me. Instead I need to wait to sit at my computer, add the contributor as a remote, check it all out manually, before I can press the merge button. Honestly this 2nd step probably fails to happen sometimes.

I just think in general, we should be "testing" stuff at as early as possible a phase, to make our own lives easier. And this includes the build system.

That being said, I don't think we should try to issue a 9.0 release without any smoketester. We should have SOMETHING, so +1 to fix the existing smoketester to play well with the 9.0 build if that is the easiest step.

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Yep, these sanity checks are needed without a doubt. And some of them will be easier to implement in Python than in Java. Others could be implemented in Java and run as plain tests on the release folder/ zip - I've done it before and it works quite well.

because our PRs don't simply run the tests for me

Oh, indeed. https://github.com/apache/lucene/blob/main/.github/workflows/gradle-precommit.yml#L38-L39

Maybe we should just enable a separate (concurrent) job that just runs the tests? Not sure how long they'll take on github infrastructure but I think they should fit the timeout limit?

asfimport commented 3 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I'm not knowledgeable on the limits but on my 2-core laptop the "precommit" takes almost as long as "test". +1 to enable this indeed.

Sorry to distract from the issue, but to me this stuff is the issue :)

We just have to "push back" all tests to give as early feedback as possible, including smoketester.

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

You can actually see what's taking how long on github PRs - I added it a while ago:

image-2021-10-12-12-47-11-480.png

vs.

image-2021-10-12-12-48-15-373.png

asfimport commented 3 years ago

Robert Muir (@rmuir) (migrated from JIRA)

yes, slowest tests are in backwards, the backwards tests are out of control already and 9.0 isn't even out. I will open a separate issue.

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit ae956db41c4b40d3ea7c028d6abe9b71da1ae74c in lucene's branch refs/heads/main from Jan Høydahl https://gitbox.apache.org/repos/asf?p=lucene.git;h=ae956db

LUCENE-9997 Revisit smoketester for 9.0 build (#355)

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

@rmuir Do you want to keep this Jira open for more improvements, or should we open new JIRAs for more targeted follow up tasks?

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

I'm late for the party... I recently came up with a similar idea and found this issue. Just skimmed through smokeTestRelease.py, I also found some artifacts' verifications (e.g. validating the MANIFESTs) can/should be done not only at pre-releace check but also at daily development (unless it takes too much time) so that problems will be caught in the more early phase.

What I have in mind is, instead of fully replacing the python script (we'll need a tool for the release artifact anyway) with Groovy/Java, we could pick tasks that can be implemented as independent Gradle tasks and call it from both of the smeketester and gradle check (or precommit) task, one by one.

Also, it could be a convenient tool for "manual" artifact checks (for example, it could extract and dump all MANIFESTs into a task log file)?

 

If that makes sense, I think I can help with that (after release 9.0).

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

A nice property of the Python smoketester is that it will also catch bugs and regressions in the gradle build itself. What if a gradle check is inadvertently removed? The smoketester gives an "outside perspective" on the result of the whole build, packaging, signing, publishing etc, and validates that the actually published bits are in compliance and work.

I absolutely believe we can and should add more early checks, but some of the most important checks benefit from being duplicated.

One example is javadoc broken links checks. I think the smoketester in 8.x checks the javadocs links, but not in 9.x, and I believe the reason is that the gradle build already does that if I'm not mistaken?

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

Thanks, Jan, for your feedback.

I understand the smoketester has definitely different perspectives/purposes from daily tests/checks. On the other hand, to me, gradle (groovy, java) is a generic tool and I think it also works well as a framework for verifications on the result of the whole build just like Python, if the tasks are properly written. So I thought for some (not all) verifications we could run the same check on the artifacts at both phases - daily tests/checks and pre-release final verification. But yes, we can implement duplicated checks: ones for gradle build, one for the smokechecker.

 

bq. I think the smoketester in 8.x checks the javadocs links, but not in 9.x

I'm sorry I don't fully understand your intention here. In my understanding, the smoketester calls "gradlew check", and the "checkBrokenLinks" task (it's a wrapper for "checkJavadocLinks.py") is called from the "check" task. So the javadoc links are also checked in 9.0 smeketester as before?

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

I'm sorry I don't fully understand your intention here. In my understanding, the smoketester calls "gradlew check", and the "checkBrokenLinks" task (it's a wrapper for "checkJavadocLinks.py") is called from the "check" task. So the javadoc links are also checked in 9.0 smeketester as before?

The checkJavadocLins.py script is gone as of #10255 and that's why I commented out the checkBokenLinks from smoketester, believing that those tests now run as part of the regular "gradle check". But that may not be the case - can you shed some light on this @rmuir ?

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

I have tracked the issues so I think I can make an answer. In 8.x, there are two python scripts under lucene/dev-tools: "checkJavaDocs.py" and "checkJavadocLinks.py" (it was a bit confusing for me). https://github.com/apache/lucene-solr/tree/branch_8x/dev-tools/scripts

On Gradle build, "checkJavaDocs.py" was replaced with doclet on #10255. "checkJavadocLinks.py" is still there but moved under gradle/documentation; and the python script is called from "check" task, via "checkBrokenLinks" task. https://github.com/apache/lucene/tree/main/gradle/documentation/check-broken-links https://github.com/apache/lucene/blob/cfd9f9f98f7176fb71ff11d98cba61be366db758/gradle/documentation/check-broken-links.gradle#L31

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Thanks. Then we'll need to re-enable checkBrokenLinks task again.

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

TODO:

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

ah, maybe my explanation wasn't good? It seems the smoketester script on the latest main invokes "gradle check", and "checkBrokenLinks" task is called from the "check" task, so the javadoc links check is already (indirectly though) enabled in 9.x just as before, I think.

https://github.com/apache/lucene/blob/cfd9f9f98f7176fb71ff11d98cba61be366db758/dev-tools/scripts/smokeTestRelease.py#L612

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

Test the smoketester on artifacts produced by the new build (buildAndPushRelease)

I think I can try this out - more testers would be good - if it's not very difficult to set up the test environment.

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

In order to try this, please check out #11211 for further instructions.

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

New PR https://github.com/apache/lucene/pull/391 for further smoketester adaption. See PR for details. It now runs all the way until maven artifact signature checks which fail since the maven artifacts do not have .asc files... Feel free to give it a spin yourself and report back.

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Hi Jan. These maven artifacts are not signed by default - I think we can sign them if -Psign is enabled, then ascii signatures will work. I'll handle this today.

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Filed a PR here:

https://github.com/apache/lucene/pull/392

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

The smoke tester writes a repository-local file "rev.txt" and pollutes pristine local state. I'm not sure why it's doing it but perhaps it should be writing to temp?

with open('rev.txt', mode='wb') as f:
  f.write(rev.encode('UTF-8')) 

 

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Also, the beauty of gradle is that we (in theory) wouldn't have to clean prior to running assembleRelease (which would make the smoke tester much faster). Gradle should handle all the incremental changes and synchronizations - files are not copied, they're synced to the distribution submodule. If you feel safer with a "clean" then I'd suggest running "git clean -xfd ." as it really restores the pristine state of the repository.

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit c4c3c3270e4283f6cb1df4959fb0b33069941688 in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=c4c3c32

LUCENE-9997: Collect signed maven artifacts if -Psign is passed. (#392)

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

The smoke tester writes a repository-local file "rev.txt" and pollutes pristine local state.

I opened a PR: https://github.com/apache/lucene/pull/394

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I ran the full check with local keys and dev mode (on jan/lucene9997-smoketester-part-2).

SUCCESS! [0:07:30.308004]

Two things that are odd:

asfimport commented 3 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

The rev.txt file is for running on a "prepared" package - it saves the git revision in a separate file because otherwise it wouldn't have any means to read it back from.

  parser.add_argument('--no-prepare', dest='prepare', default=True, action='store_false',
                      help='Use the already built release in the provided checkout')

I think we should make the git revision part of the distribution artifacts - then the smoke tester can read it directly from the distribution artifact release folder. Moreover, the git revision could also be part of the "source" distribution of Lucene - then the build scripts can be tweaked to actually work without the git clone (on the true "source" distribution) by simulating the git revision read from such a file.

Thoughts?

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

Yey, first smoketest SUCCESS on freshly built lucene release artifacts on PR 391.

...
    verify maven artifact sigs .................................................................................................
    unpack lucene-9.0.0.tgz...
    verify that Maven artifacts are same as in the binary distribution...
    verify JAR metadata/identity/no javax.* or java.* classes...

SUCCESS! [0:09:23.758716]
asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit c77e9ddf93ae872ba6556d39c48a0a32e31e91b1 in lucene's branch refs/heads/main from Jan Høydahl https://gitbox.apache.org/repos/asf?p=lucene.git;h=c77e9dd

LUCENE-9997 Second pass smoketester fixes for 9.0 (#391)

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com> Co-Authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 54418cef450afa8a2e45904f68c6db45e241c584 in lucene's branch refs/heads/main from Tomoko Uchida https://gitbox.apache.org/repos/asf?p=lucene.git;h=54418ce

LUCENE-9997: write release revision to system temp dir (#394)

asfimport commented 3 years ago

Jan Høydahl (@janhoy) (migrated from JIRA)

> r-- permissions on all maven artifact files

I have noticed that too. It is done in https://github.com/apache/lucene/blob/main/dev-tools/scripts/buildAndPushRelease.py#L234 but I don't know why

asfimport commented 3 years ago

Tomoko Uchida (@mocobeta) (migrated from JIRA)

I think we should make the git revision part of the distribution artifacts - then the smoke tester can read it directly from the distribution artifact release folder. Moreover, the git revision could also be part of the "source" distribution of Lucene - then the build scripts can be tweaked to actually work without the git clone (on the true "source" distribution) by simulating the git revision read from such a file.

+1 - if we are willing to refactor the huge smoketester script...

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit c77e9ddf93ae872ba6556d39c48a0a32e31e91b1 in lucene's branch refs/heads/hnsw from Jan Høydahl https://gitbox.apache.org/repos/asf?p=lucene.git;h=c77e9dd

LUCENE-9997 Second pass smoketester fixes for 9.0 (#391)

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com> Co-Authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>

asfimport commented 3 years ago

ASF subversion and git services (migrated from JIRA)

Commit 54418cef450afa8a2e45904f68c6db45e241c584 in lucene's branch refs/heads/hnsw from Tomoko Uchida https://gitbox.apache.org/repos/asf?p=lucene.git;h=54418ce

LUCENE-9997: write release revision to system temp dir (#394)

asfimport commented 2 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 9.0.0 release