Open tahorst opened 4 years ago
I have not seen this problem locally nor on GCloud.
My last run on GCloud way May 18. It was 2 gens, seed 0, variant 0.
What's in a large minimal build?
What's in a large minimal build?
Minimal media for 25 gens. Most others are only 8 gens or less although optional features does 8 gens twice and then 2 gens for a total of 18 gens.
Appears to have happened again but this time in the parca (for anaerobic and AA builds with nearly identical timestamps). Execution hung until the script timeout several hours later.
2020-05-25 00:59:22,735 INFO RUNNING fw_id: 36 in directory: /scratch/groups/mcovert/jenkins/workspace@2
2020-05-25 00:59:23,226 INFO Task started: {{wholecell.fireworks.firetasks.fitSimData.FitSimDataTask}}.
Build step 'Execute shell' marked build as failure
2020-05-25 00:59:22,875 INFO RUNNING fw_id: 36 in directory: /scratch/groups/mcovert/jenkins/workspace
2020-05-25 00:59:23,393 INFO Task started: {{wholecell.fireworks.firetasks.fitSimData.FitSimDataTask}}.
Build step 'Execute shell' marked build as failure
This was also coupled with three other build failures which seems to suggest this is Sherlock FS specific:
Cloning repository https://github.com/CovertLab/wcEcoli.git
> git init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/> # timeout=10
ERROR: Timeout after 10 minutes
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/>
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:767)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:559)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1120)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1160)
at hudson.scm.SCM.checkout(SCM.java:495)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1724)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/"> returned status code 143:
stdout:
stderr:
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1958)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1954)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1592)
at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:765)
... 12 more
ERROR: Error cloning remote repo 'origin'
Indeed, the Sherlock team ought to have error logs that they could compare with these failure timestamps.
There have been some Jenkins failures that appear to be Sherlock issues. Sims stop at a certain point and then the build times out without any useful information. It's happened 3 times with the large minimal build and once with the optional features build in the last month (it has happened before but much less frequently). Builds before and after pass even without any code changes. Has anyone experienced any hanging like this when running, locally or remotely? It would be good to know if it's Sherlock specific or if something in the code is causing a hang.