Sage-Bionetworks / SynapseWorkflowHook

Code for linking a workflow engine to a Synapse evaluation queue
Apache License 2.0
4 stars 1 forks source link

Issue with mutliple queues #46

Closed andrewelamb closed 5 years ago

andrewelamb commented 5 years ago

When I set up my .env file like:

DOCKER_ENGINE_URL=unix:///var/run/docker.sock WORKFLOW_TEMPDIR=/home/ubuntu/temp_dir SYNAPSE_USERNAME=andrew.lamb@sagebase.org SYNAPSE_PASSWORD=xxxxx WORKFLOW_OUTPUT_ROOT_ENTITY_ID=syn17019523 EVALUATION_TEMPLATES={"9614257":"syn20055743", "9614252":"syn20055743"} TOIL_CLI_OPTIONS=--defaultMemory 100M --retryCount 0 --defaultDisk 1000000 MAX_CONCURRENT_WORKFLOWS=1

I get an error as soon as I make submissions to each queue:

workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - PROGRESS: null workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.Archiver - logFile /tmp/9686656_logs.txt has no content. Nothing to upload. workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.Utils - mask: 1 notificationEnabled: 31 result: 1 workflow-hook_1 | [org.sagebionetworks.WorkflowHook.main()] INFO org.sagebionetworks.WorkflowHook - We have met or exceeded the maximum concurrent workflow count, 1, so we will not start 9686657 at this time. workflow-hook_1 | [WARNING] workflow-hook_1 | java.lang.reflect.InvocationTargetException workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) workflow-hook_1 | at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) workflow-hook_1 | at java.lang.reflect.Method.invoke (Method.java:566) workflow-hook_1 | at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297) workflow-hook_1 | at java.lang.Thread.run (Thread.java:834) workflow-hook_1 | Caused by: java.lang.IllegalStateException: The following workflow job(s) are running but have no corresponding open Synapse submissions. workflow-hook_1 | workflow_job.4046cd86-d772-47e6-91d2-9e171a30d52f workflow-hook_1 | One way to recover is to delete the workflow job(s). workflow-hook_1 | at org.sagebionetworks.WorkflowHook.updateWorkflowJobs (WorkflowHook.java:342) workflow-hook_1 | at org.sagebionetworks.WorkflowHook.execute (WorkflowHook.java:195) workflow-hook_1 | at org.sagebionetworks.WorkflowHook.main (WorkflowHook.java:111) workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) workflow-hook_1 | at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) workflow-hook_1 | at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) workflow-hook_1 | at java.lang.reflect.Method.invoke (Method.java:566) workflow-hook_1 | at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:297) workflow-hook_1 | at java.lang.Thread.run (Thread.java:834) workflow-hook_1 | [INFO] ------------------------------------------------------------------------ workflow-hook_1 | [INFO] BUILD FAILURE workflow-hook_1 | [INFO] ------------------------------------------------------------------------ workflow-hook_1 | [INFO] Total time: 24.372 s workflow-hook_1 | [INFO] Finished at: 2019-06-26T17:49:53Z workflow-hook_1 | [INFO] ------------------------------------------------------------------------ workflow-hook_1 | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: The following workflow job(s) are running but have no corresponding open Synapse submissions. workflow-hook_1 | [ERROR] workflow_job.4046cd86-d772-47e6-91d2-9e171a30d52f workflow-hook_1 | [ERROR] One way to recover is to delete the workflow job(s).

thomasyu888 commented 5 years ago

Can confirm, I am also running into this same exact issue.

thomasyu888 commented 5 years ago

@brucehoff. Please look into this as it is a critical issue. The current workaround is to have multiple instances running - one for each workflow.

brucehoff commented 5 years ago

https://github.com/Sage-Bionetworks/SynapseWorkflowHook/commit/a3c4a194dde908e26e71d20b95dc52328cc3ded2

trberg commented 5 years ago

@brucehoff should this issue be solved now? I pulled in the new code and I'm still having the same issues.

brucehoff commented 5 years ago

Yes, the issue should be solved. Will you please share the detailed symptoms of the problem you are having?

trberg commented 5 years ago

Here is the error report. If you need, I can post the full verbose output:

workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [INFO] BUILD FAILURE
workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [INFO] Total time:  5.391 s
workflow-hook_1  | [INFO] Finished at: 2019-07-08T16:51:13Z
workflow-hook_1  | [INFO] ------------------------------------------------------------------------
workflow-hook_1  | [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project WorkflowHook: An exception occured while executing the Java class. null: InvocationTargetException: The following workflow job(s) are running but have no corresponding open Synapse submissions.
workflow-hook_1  | [ERROR]      workflow_job
workflow-hook_1  | [ERROR]      workflow_job.627652bb-5e22-4fb8-9034-7279163b350a
workflow-hook_1  | [ERROR] One way to recover is to delete the workflow job(s).
workflow-hook_1  | [ERROR] -> [Help 1]
workflow-hook_1  | [ERROR] 
workflow-hook_1  | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
workflow-hook_1  | [ERROR] Re-run Maven using the -X switch to enable full debug logging.
workflow-hook_1  | [ERROR] 
workflow-hook_1  | [ERROR] For more information about the errors and possible solutions, please read the following articles:
workflow-hook_1  | [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
workflow-hook_1  | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down signal received.
workflow-hook_1  | [Thread-1] INFO org.sagebionetworks.ShutdownHook - Shut down complete.
compose.cli.verbose_proxy.proxy_callable: docker wait <- ('d69a39530e15c77e89094965f35cd30ff93cf45d122e9a530ebec0a2e2ebd553')
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('d69a39530e15c77e89094965f35cd30ff93cf45d122e9a530ebec0a2e2ebd553')
urllib3.connectionpool._make_request: http://localhost:None "POST /v1.25/containers/d69a39530e15c77e89094965f35cd30ff93cf45d122e9a530ebec0a2e2ebd553/wait HTTP/1.1" 200 17
compose.cli.verbose_proxy.proxy_callable: docker wait -> {'StatusCode': 1}
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.25/containers/d69a39530e15c77e89094965f35cd30ff93cf45d122e9a530ebec0a2e2ebd553/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_container -> {'AppArmorProfile': '',
 'Args': ['/bin/sh',
          '-c',
          'exec mvn exec:java -DentryPoint=org.sagebionetworks.WorkflowHook'],
 'Config': {'ArgsEscaped': True,
            'AttachStderr': False,
            'AttachStdin': False,
            'AttachStdout': False,
            'Cmd': ['/bin/sh',
                    '-c',
...
workflow_workflow-hook_1 exited with code 1
thomasyu888 commented 5 years ago

@trberg. Couple things.

  1. Did you do a git pull on the new repository? -> You will need to do this to start from the new docker-compose.yaml
  2. Can you remove all running containers and images and re-pull them down.
brucehoff commented 5 years ago

Would you please try the following:

docker rm workflow_job
docker rm workflow_job.627652bb-5e22-4fb8-9034-7279163b350a
docker-compose down
docker pull sagebionetworks/synapseworkflowhook
docker-compose up

and then let us know if problems continue.

trberg commented 5 years ago

Great! Thanks, that seems to have cleaned up this issue. It looked like we had some extra submitted dockers that were left from previous pipeline runs that were causing issues.