broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
993 stars 359 forks source link

PAPI error code 14 #6306

Open anniepgu opened 3 years ago

anniepgu commented 3 years ago

IMPORTANT: Please file new issues over in our Jira issue tracker!

https://broadworkbench.atlassian.net/projects/BA/issues

You may need to create an account before you can view/create issues.

The backend the workflow pipelines is https://genomics.googleapis.com/

Error message: The job was stopped before the command finished. PAPI error code 14. Execution failed: worker was terminated.

The job was running on non-preemptible VM, with one instance of nvidia-tesla-t4 attached, nvidiaDriverVersion: 418.40.04.

What does "PAPI error code 14" mean? Can you suggest what we should do with it?

Thanks!

aednichols commented 3 years ago

According to the list of PAPI error codes, 14 is indeed preemption so I agree that is surprising on a non-preemptible.

You can use Cromwell retries to re-run; or reach out to your GCP support venue to better understand what's going on.

pgrosu commented 3 years ago

There is a newer backend that replaces the genomics one: https://lifesciences.googleapis.com/

For more information below are a couple of links:

  1. https://cloud.google.com/life-sciences/docs/how-tos/migration

  2. https://cromwell.readthedocs.io/en/stable/backends/Google/#migration-from-google-cloud-genomics-v2alpha1-to-google-cloud-life-sciences-v2beta

Hope it helps, Paul