apache / netbeans

Apache NetBeans
https://netbeans.apache.org/
Apache License 2.0
2.64k stars 849 forks source link

conditional CI pipeline + migrate some jobs from travis to gh #4431

Closed mbien closed 2 years ago

mbien commented 2 years ago

Now that we have the basic CI features implemented (cancellable, restartable, junit-reports #4328 and a simple pipeline #3785), we can migrate some travis jobs to github.

I don't think we can migrate everything, since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

so far migrated: sig test, java hints, javafx, enterprise, java modules, profiler, apisupport, extide, platform modules, webcommon, rat, lib validation

The good news is that a lot of time is saved simply by having the jobs as part of the pipeline, the initial work is now paying off, travis did build in every job. Another way to save CI time is by triggering long running jobs/steps only with certain PR labels, see below.

other changes:

neilcsmith-net commented 2 years ago

I don't think we can migrate everything since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

No, but the situation with Travis isn't much different. What about taking this as a chance to try some of the heavier tests on different triggers? A daily basis on certain branches, or on PR merges?

mbien commented 2 years ago

I don't think we can migrate everything since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

No, but the situation with Travis isn't much different. What about taking this as a chance to try some of the heavier tests on different triggers? A daily basis on certain branches, or on PR merges?

sure thats all no problem in principle. The main issue is to actually split the tests up and decide what should run when. Triggering based on code paths, or even PR labels is also an option, but this has been discarded every time we had the discussion. PHP for example almost became conditionally triggered as first experiment.

But i do agree, this project runs probably ~20h of testing per sync (travis alone is 13h). Thats simply too much. (edit, the pipeline will reduce this a bit since each job won't need to build everything)

neilcsmith-net commented 2 years ago

sure thats all no problem in principle. The main issue is to actually split the tests up and decide what should run when. Triggering based on code paths, or even PR labels is also an option, but this has been discarded every time we had the discussion.

Yes, I know! :exploding_head: I actually meant that as a hint that you might want to propose a suitable split, whatever makes sense to you. Then anyone who wants to -1 it can actually do the work to propose how to move things around to keep our CI usage at a workable (and polite to the rest of ASF) level. :smile:

mbien commented 2 years ago

@neilcsmith-net looks promising so far since some jobs on travis did spend most of the time building. When run as part of the pipeline the time goes down to a few minutes.

I don't really know which tests need a display. I might turn the xserver on for all tests on linux to keep things simple (done).

observations so far:

mbien commented 2 years ago

the gradle test is failing, both locally and on github. I don't know why this test is working on travis. I used the same command lines and tried to replicate everything locally and couldn't find a reason. (even diffed the travis log to a local log)

Testcase: testCompilePreTrusted(org.netbeans.modules.gradle.java.classpath.ClassPathProviderImplTest):  FAILED
Closed project should be at least EVALUATED
junit.framework.AssertionFailedError: Closed project should be at least EVALUATED
    at org.netbeans.modules.gradle.java.classpath.ClassPathProviderImplTest.testCompilePreTrusted(ClassPathProviderImplTest.java:264)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at org.netbeans.junit.NbTestCase.access$200(NbTestCase.java:77)
    at org.netbeans.junit.NbTestCase$2.doSomething(NbTestCase.java:476)
    at org.netbeans.junit.NbTestCase$1Guard.run(NbTestCase.java:402)
    at java.base/java.lang.Thread.run(Thread.java:829)

fixed by #4456

mbien commented 2 years ago
mbien commented 2 years ago

another sync with master to resolve conflicts. Rebasing this PR can be tricky since so much changed in the files - easy to overlook something during merge.

going to add everyone for review since there was almost no feedback on the dev list!

mbien commented 2 years ago

One concern I have is, that the labels are not there, when a PR is created and would need a trigger. Forcing users/reviewers to do a forced push or request one sounds stupid and I remembered, that gitlab had an option to manually trigger CI/CD pipelines and it turns out github has something similar:

https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch

This would allow a reviewer add labels and then trigger a run. What do you think?

thanks for taking a look @matthiasblaesing I can always count on you for reviews :)

I fully agree that this is currently the most annoying aspect of this change, I haven't figured out how to solve this though. Here the issues:

I haven't given up on that. Ideas welcome.

We have still options though:

neilcsmith-net commented 2 years ago

if a workflow is manually triggered via workflow_dispatch it can't access the pr-context since there is none,

Does re-run all jobs not work after relabelling then?

we could let unlabeled PRs run everything instead of the minimum job count, (ci:all-tests equivalent) would this be a better default?

Maybe we should start with that as default anyway, but stated as time-limited for a while as people get used to it / feed back concerns?

Either way, make sure to do a mail on dev@ with [NOTICE] in the subject when this merges. Then no-one should be complaining they didn't see it! :wink:

mbien commented 2 years ago

if a workflow is manually triggered via workflow_dispatch it can't access the pr-context since there is none,

Does re-run all jobs not work after relabelling then?

unfortunately not, that would have been perfect. The workflow takes a snapshot of the context at the beginning it seems. It doesn't see label updates on restarts.

we could let unlabeled PRs run everything instead of the minimum job count, (ci:all-tests equivalent) would this be a better default?

Maybe we should start with that as default anyway, but stated as time-limited for a while as people get used to it / feed back concerns?

I had the same thought too. Let me put all-tests as default. Makes the transition easier

Either way, make sure to do a mail on dev@ with [NOTICE] in the subject when this merges. Then no-one should be complaining they didn't see it! wink

will do! and probably cc some oracle devs right?

There is something i want to try with workflow_dispatch in a testing repo. But as tom cruise would say: we need two miracles in succession. First the manual dispatch would have to actually appear under the PR if its even possible to run them on PR branches. Second: dispatch can use up to 10 input variables on its form, i could try to map them to labels somehow. I hope you can use them in job conditions since there are some things you can't use: e.g env vars for entirely undocumented reasons.

mbien commented 2 years ago

update: workflow_dispatch can only run on regular branches (e.g master). You can't select PR branches in the UI. So i can skip testing the rest.

neilcsmith-net commented 2 years ago

Does re-run all jobs not work after relabelling then?

unfortunately not, that would have been perfect.

OK, second thought. Does changing the pull request event types to include labeled work? https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

mbien commented 2 years ago

OK, second thought. Does changing the pull request event types to include labeled work? https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

mbien commented 2 years ago

but what if we abuse reopened for this?

there is also "unlocked" which we will probably also rarely use

neilcsmith-net commented 2 years ago

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

Given this enables many of the tests to eventually not run without required labels, I'm not sure that's a huge issue in practice, and it's certainly more obvious. If the label trigger works, IMO we could also consider going back to minimal tests by default from the start.

mbien commented 2 years ago

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

Given this enables many of the tests to eventually not run without required labels, I'm not sure that's a huge issue in practice, and it's certainly more obvious. If the label trigger works, IMO we could also consider going back to minimal tests by default from the start.

when I read the doc this indeed seemed to be the obvious way how to solve this issue. But this dissipated quickly:

how about using the unlocked event as "secret" trigger for the workflow? I just tested it in a sandbox and it worked well

I assume that most devs will try to label PRs correctly right away or will sync PRs several times anyway during review which gives enough opportunities to update labels. For new contributors and the event of a PR which does not need any changes, reviewers could use this semi-secret trick to trigger a new build after labeling the PR.

neilcsmith-net commented 2 years ago

No email that I can see. All the options seem pretty ugly.

Is the actual added label in the event? And if so, easy to filter by prefix ci:?

Use gh cli to read the PR labels directly rather than from the event?

Use a comment trigger and filter by text (eg. <RETEST THE DAMN THING>)???

Yep, all ugly. From my perspective, go with what you think makes most sense initially. Can always be tweaked later. Anyway, you're in charge of explaining any secret triggers! :smile:

mbien commented 2 years ago

ready to integrate from my side. Added the "unlocked" trigger and put a info line into the PR template.

I think its better to integrate it as is without squashing everything. It makes it easier to check if for example there was a mistake the migration of certain jobs without having to go through everything.

mbien commented 2 years ago

full list: https://github.com/apache/netbeans/labels?q=ci

@vieiro I think it is going to be fairly intuitive. Even if you don't label anything there will be still about 3h of CI time spent (maybe we can reduce this further in future). This won't be a fine grained configuration, Existing PRs which update maven features are already labeled with "Maven" right now. Once this is merged, the same label will also toggle maven related tests which are aggregated in a build tools job.

mbien commented 2 years ago

merging. Going to send the [NOTICE] mail to the dev list too as @neilcsmith-net advised.