Simon-Harris-IBM / ObjectNetChallenge-Workflows

Workflows for ObjectNet Challenge
0 stars 1 forks source link

2 workflows assigned to same queue which causes one to fail #34

Closed Simon-Harris-IBM closed 4 years ago

Simon-Harris-IBM commented 4 years ago

I just tested rapidly submitting 3 submissions. 2 of the submissions appear to have been assigned to the same backend Q (q1) which causes one of the submissions to fail. Logs are here: https://www.synapse.org/#!Synapse:syn21765266

MAX_CONCURRENT_WORKFLOWS=1 in the orchestrators .env file to ensure that only one workflow executes on a server at any time.

Also, the code in https://github.com/Simon-Harris-IBM/ObjectNetChallenge-Workflows/blob/master/get_backend_queue.cwl should prevent this from happening ? This code checks for an 'EVALUATION_IN_PROGRESS' status before making the queue unavailable to receive submissions. I'm wondering is the status on that particular queue was "RECEIVED" when the 2nd job was assigned to it. So should we be checking for queues with status of "EVALUATION_IN_PROGRESS" and "RECEIVED" ???

See dashboard at: https://www.synapse.org/#!Synapse:syn21445381/wiki/601587 submissions 9701935 & 9701939. Look at the workflow start times for the jobs on the backend q's they are just a minute apart.

thomasyu888 commented 4 years ago

@Simon-Harris-IBM Ideally it wouldn't fail, you would just set MAX_CONCURRENT_WORKFLOWS to 1 for all the machines you are on. That being said, you could also check for RECEIVED submission in the backend_queue.cwl tool

Simon-Harris-IBM commented 4 years ago

Cannot reproduce. Closing