Sage-Bionetworks / SynapseWorkflowHook

Code for linking a workflow engine to a Synapse evaluation queue
Apache License 2.0
4 stars 1 forks source link

Invalid submission runs through scoring harness workflow, but error #10

Closed thomasyu888 closed 6 years ago

thomasyu888 commented 6 years ago
Workflow Administrator:

Submission  failed. 

 The message is:

    The following workflow job(s) are running but have no corresponding open Synapse submissions.
    workflow_job.18afafc3-f018-45cf-9366-02457661c13a
One way to recover is to delete the workflow job(s).

Shouldn't the workflow hook continue to run even after an invalid submission?

thomasyu888 commented 6 years ago

In my other job where I run docker run even though the workflow fails, it does something like this...

workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
thomasyu888 commented 6 years ago

https://github.com/Sage-Bionetworks/ChallengeWorkflowTemplates - scoringHarness_workflow.cwl

brucehoff commented 6 years ago

Shouldn't the workflow hook continue to run even after an invalid submission?

It's not a matter of the submission being invalid, it's that the workflow hook failed to remove the container afterwards. The correction is to manually delete the container and restart the hook. If this happens frequently then we need to dig in to why the hook is failing to delete finished containers.

thomasyu888 commented 6 years ago

This happens everytime with the workflow where I submit an invalid prediction file.

thomasyu888 commented 6 years ago
WARNING: The SHARE_RESULTS_IMMEDIATELY variable is not set. Defaulting to a blank string.
WARNING: The DATA_UNLOCK_SYNAPSE_PRINCIPAL_ID variable is not set. Defaulting to a blank string.
Creating synapseworkflowhook_workflow-hook_1 ... done
Attaching to synapseworkflowhook_workflow-hook_1
workflow-hook_1  | [INFO] Scanning for projects...
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.pom (7.7 kB at 42 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/mojo-parent/28/mojo-parent-28.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/mojo-parent/28/mojo-parent-28.pom (26 kB at 372 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/codehaus-parent/3/codehaus-parent-3.pom
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/codehaus-parent/3/codehaus-parent-3.pom (4.1 kB at 108 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.jar
Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/mojo/exec-maven-plugin/1.2.1/exec-maven-plugin-1.2.1.jar (38 kB at 675 kB/s)
workflow-hook_1  | [INFO] 
workflow-hook_1  | [INFO] ------------------< org.sagebionetworks:WorkflowHook >------------------
workflow-hook_1  | [INFO] Building WorkflowHook 1.0-SNAPSHOT
workflow-hook_1  | [INFO] --------------------------------[ jar ]---------------------------------
workflow-hook_1  | [INFO] 
workflow-hook_1  | [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @ WorkflowHook >>>
workflow-hook_1  | [INFO] 
workflow-hook_1  | [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @ WorkflowHook <<<
workflow-hook_1  | [INFO] 
workflow-hook_1  | [INFO] 
workflow-hook_1  | [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ WorkflowHook ---
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.pom (601 B at 30 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/maven/maven/2.0/maven-2.0.pom (8.8 kB at 487 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.pom
Downloaded from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.pom (11 kB at 561 kB/s)
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.jar
workflow-hook_1  | Downloading from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar
Downloaded from central: http://repo1.maven.org/maven2/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar (53 kB at 657 kB/s)
workflow-hook_1  | Downloaded from central: http://repo1.maven.org/maven2/org/codehaus/plexus/plexus-container-default/1.0-alpha-9/plexus-container-default-1.0-alpha-9.jar (195 kB at 2.2 MB/s)
workflow-hook_1  | ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Precheck completed successfully.
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | 00:05:18.451 [org.sagebionetworks.WorkflowHook.main()] ERROR org.sagebionetworks.client.SynapseProfileProxy - java.lang.reflect.InvocationTargetException
workflow-hook_1  | Oct 02, 2018 12:05:18 AM org.sagebionetworks.ExponentialBackoffRunner execute
workflow-hook_1  | SEVERE: Found status code 409. Will not retry: Status Code: 409 message: An entity with the name: 3324230 already exists with a parentId: syn16809914
workflow-hook_1  | 00:05:19.997 [org.sagebionetworks.WorkflowHook.main()] ERROR org.sagebionetworks.client.SynapseProfileProxy - java.lang.reflect.InvocationTargetException
workflow-hook_1  | Oct 02, 2018 12:05:19 AM org.sagebionetworks.ExponentialBackoffRunner execute
workflow-hook_1  | SEVERE: Found status code 409. Will not retry: Status Code: 409 message: An entity with the name: 3324230_LOCKED already exists with a parentId: syn16809914
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WES - workingDir: /Users/ThomasY/sandbox/e3faae96-5519-434f-bd60-acef9594d882
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WES - toil cmd: [toil-cwl-runner, --defaultMemory, 100M, --retryCount, 0, --defaultDisk, 1000000, --workDir, /Users/ThomasY/sandbox/e3faae96-5519-434f-bd60-acef9594d882, --noLinkImports, ChallengeWorkflowTemplates-master/scoringHarness_workflow.cwl, /Users/ThomasY/sandbox/e3faae96-5519-434f-bd60-acef9594d882/TMP5659771241079837092.yaml]
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - PROGRESS: null
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.Archiver - Found 582 bytes to log.
workflow-hook_1  | 00:06:54.121 [org.sagebionetworks.WorkflowHook.main()] ERROR org.sagebionetworks.client.SynapseProfileProxy - java.lang.reflect.InvocationTargetException
workflow-hook_1  | Oct 02, 2018 12:06:54 AM org.sagebionetworks.ExponentialBackoffRunner execute
workflow-hook_1  | SEVERE: Found status code 409. Will not retry: Status Code: 409 message: An entity with the name: 3324230 already exists with a parentId: syn16809914
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.Archiver - Will archive /tmp/9681565_logs.txt
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.Archiver - Archived /tmp/9681565_logs.txt to syn16925041
workflow-hook_1  | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - Top level loop: checking progress or starting new job.
workflow-hook_1  | [WARNING] 
workflow-hook_1  | java.lang.reflect.InvocationTargetException
workflow-hook_1  |     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
workflow-hook_1  |     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
workflow-hook_1  |     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
workflow-hook_1  |     at java.lang.reflect.Method.invoke (Method.java:566)
workflow-hook_1  |     at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297)
workflow-hook_1  |     at java.lang.Thread.run (Thread.java:834)
workflow-hook_1  | Caused by: java.lang.IllegalStateException: The following workflow job(s) are running but have no corresponding open Synapse submissions.
workflow-hook_1  |  workflow_job.397c09ac-34d1-474f-8d30-301340c22867
workflow-hook_1  | One way to recover is to delete the workflow job(s).
workflow-hook_1  |     at org.sagebionetworks.WorkflowHook.updateWorkflowJobs (WorkflowHook.java:315)
workflow-hook_1  |     at org.sagebionetworks.WorkflowHook.execute (WorkflowHook.java:186)
workflow-hook_1  |     at org.sagebionetworks.WorkflowHook.main (WorkflowHook.java:104)
workflow-hook_1  |     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
workflow-hook_1  |     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
workflow-hook_1  |     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
workflow-hook_1  |     at java.lang.reflect.Method.invoke (Method.java:566)
workflow-hook_1  |     at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297)
workflow-hook_1  |     at java.lang.Thread.run (Thread.java:834)
workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [INFO] BUILD FAILURE
workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [INFO] Total time: 02:03 min
workflow-hook_1  | [INFO] Finished at: 2018-10-02T00:07:15Z
workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: The following workflow job(s) are running but have no corresponding open Synapse submissions.
workflow-hook_1  | [ERROR]  workflow_job.397c09ac-34d1-474f-8d30-301340c22867
workflow-hook_1  | [ERROR] One way to recover is to delete the workflow job(s).
workflow-hook_1  | [ERROR] -> [Help 1]
workflow-hook_1  | [ERROR] 
workflow-hook_1  | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
workflow-hook_1  | [ERROR] Re-run Maven using the -X switch to enable full debug logging.
workflow-hook_1  | [ERROR] 
workflow-hook_1  | [ERROR] For more information about the errors and possible solutions, please read the following articles:
workflow-hook_1  | [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
workflow-hook_1  | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down signal received.
workflow-hook_1  | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down complete.
synapseworkflowhook_workflow-hook_1 exited with code 1
_______________________________________________________
thomasyu888 commented 6 years ago

Should add documentation that submission status's STATUS field can not be changed!

brucehoff commented 6 years ago

The problem is that the workflow modified the submission status (e.g., from EVALUATION_IN_PROGRESS to VALIDATED) which is must not do. If you stop then this problem will go away. In addition, I changed the workflow hook code to add a 'namespace' to annotation names it uses, to avoid collision with annotations created by your workflow: https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/d68d2aa76cda6af3b1d4147c95f74458d4208910

brucehoff commented 6 years ago

Should add documentation that submission status's STATUS field can not be changed!

Good idea. I added a "Workflow creation guidelines" section to the README. Please have a look and comment or edit.

https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/fe6d77e63bc720c08d118a4efc17359116585a52