CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
19 stars 4 forks source link

Jenkins failures #883

Open tahorst opened 4 years ago

tahorst commented 4 years ago

There have been some Jenkins failures that appear to be Sherlock issues. Sims stop at a certain point and then the build times out without any useful information. It's happened 3 times with the large minimal build and once with the optional features build in the last month (it has happened before but much less frequently). Builds before and after pass even without any code changes. Has anyone experienced any hanging like this when running, locally or remotely? It would be good to know if it's Sherlock specific or if something in the code is causing a hang.

 2164.00    583.57        1.735        1.731        1.872        1.702        1.765
 2166.00    583.87        1.736        1.732        1.873        1.703        1.766
 2168.00    584.15        1.737        1.733        1.874        1.704        1.767
 2170.00    584.44        1.737        1.734        1.875        1.705        1.768
 2172.00    584.74        1.738        1.735        1.876        1.706        1.769
 2174.00    585.02      Build step 'Execute shell' marked build as failure
1fish2 commented 4 years ago

I have not seen this problem locally nor on GCloud.

My last run on GCloud way May 18. It was 2 gens, seed 0, variant 0.

What's in a large minimal build?

tahorst commented 4 years ago

What's in a large minimal build?

Minimal media for 25 gens. Most others are only 8 gens or less although optional features does 8 gens twice and then 2 gens for a total of 18 gens.

tahorst commented 4 years ago

Appears to have happened again but this time in the parca (for anaerobic and AA builds with nearly identical timestamps). Execution hung until the script timeout several hours later.

2020-05-25 00:59:22,735 INFO RUNNING fw_id: 36 in directory: /scratch/groups/mcovert/jenkins/workspace@2
2020-05-25 00:59:23,226 INFO Task started: {{wholecell.fireworks.firetasks.fitSimData.FitSimDataTask}}.
Build step 'Execute shell' marked build as failure
2020-05-25 00:59:22,875 INFO RUNNING fw_id: 36 in directory: /scratch/groups/mcovert/jenkins/workspace
2020-05-25 00:59:23,393 INFO Task started: {{wholecell.fireworks.firetasks.fitSimData.FitSimDataTask}}.
Build step 'Execute shell' marked build as failure

This was also coupled with three other build failures which seems to suggest this is Sherlock FS specific:

Cloning repository https://github.com/CovertLab/wcEcoli.git
 > git init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/> # timeout=10
ERROR: Timeout after 10 minutes
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Could not init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/>
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:767)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:559)
        at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1120)
        at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1160)
        at hudson.scm.SCM.checkout(SCM.java:495)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1202)
        at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
        at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
        at hudson.model.Run.execute(Run.java:1724)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:97)
        at hudson.model.Executor.run(Executor.java:421)
Caused by: hudson.plugins.git.GitException: Command "git init <http://localhost:4242/job/wcEcoli%20-%20020%20-%20Optional%20Features/ws/"> returned status code 143:
stdout:
stderr:
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1990)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1958)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1954)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1592)
        at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$5.execute(CliGitAPIImpl.java:765)
        ... 12 more
ERROR: Error cloning remote repo 'origin'
1fish2 commented 4 years ago

Indeed, the Sherlock team ought to have error logs that they could compare with these failure timestamps.