eclipse-packaging / packages

Eclipse IDE product definitions.
Eclipse Public License 2.0
4 stars 11 forks source link

triggering of EPP from SimRel sometimes fails #58

Closed jonahgraham closed 5 months ago

jonahgraham commented 1 year ago

Ensure that the CI build is green

Hmm - so it turns out there is a race condition around simrel's triggering of epp's build. The epp build often fails when simrel updates the content of the build while epp build is running. We get errors like: 13:47:24 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.eclipse.tycho:tycho-packaging-plugin:3.0.5:build-qualifier-aggregator (default-build-qualifier-aggregator) on project epp.package.dsl: Execution default-build-qualifier-aggregator of goal org.eclipse.tycho:tycho-packaging-plugin:3.0.5:build-qualifier-aggregator failed: Could not mirror artifact osgi.bundle,org.eclipse.gef,3.15.0.202308271959 into the local Maven repository.See log output for details. But sometimes the triggering from simrel happens in such a way that no new build runs after a failure like above. I'm not quite sure how to solve this, especially as the EPP build now takes upwards of 5 hours!

Originally posted by @jonahgraham in https://github.com/eclipse-packaging/packages/issues/57#issuecomment-1710134147

merks commented 1 year ago

A lock could be used

https://git.eclipse.org/c/simrel/org.eclipse.simrel.build.git/tree/Jenkinsfile#n127

but then the simrel build could block for very long.

jonahgraham commented 1 year ago

It turns out the problem isn't quite what I thought.

The problem this time appears to be the caching problems on download.eclipse.org (again!) - https://ci.eclipse.org/packaging/job/simrel.epp-tycho-build/2935/ was started after https://ci.eclipse.org/simrel/job/simrel.runaggregator.pipeline/2782/ was fully complete.

Yet when EPP build downloaded metadata about staging:

13:34:18 [INFO] Adding repository https://download.eclipse.org/staging/2023-09

it got the "old" (pre simrel build #2782) version as the metadata it got referred to org.eclipse.gef,3.15.0.202308271959 which was replaced in build #2782.

This is a long term effect of moving to building in Docker containers. It used to be EPP built using file:// URL path to download.eclipse.org so it would bypass the cache, etc.

jonahgraham commented 1 year ago

IIRC the reason we added delay=600sec to the triggering was to allow the cache to expire. But perhaps that isn't long enough. With a 4-5 hour build for EPP we could simply increase the delay without much negative effect, but any length delay can still hit the cache because if anything keeps the cache fresh (like other people building against staging).

The "real" solution (not involving webmaster!) is to not reuse the staging repo, e.g. if each build was in a new subdirectory of staging and the trigger passed the parameter to EPP of which subdirectory to use that would solve it. That would also solve the case of contents changing while the build was running.

However, I am not sure it is worth the effort to refactor simrel build to achieve this. Therefore I leave this bug in the backlog.

merks commented 1 year ago

The report tests build also fails sometimes maybe for the same reason.

merks commented 5 months ago

I think this is obsolete because there needs to be a new way:

https://github.com/eclipse-packaging/packages/issues/122