Open marangiop opened 3 years ago
Deployment met_00_1806_share_8last
from blueprint met-00-test-last8
inside http://cloudify.grapevine-project.eu/
We noticed a recurrent bug when we try to execute a workflow manually from the Cloudify GUI. After we have started the run_jobs workflow on a given deployment, the jobs are queued and immediately after all of the jobs have been queued, the warning message CANCELLED BY 13431 comes up all the time. The jobs of the blueprint are actually submitted and executed correctly (as showsn by squeue) on CESGA HPCM but due to this warning it is no longer possible to monitor the status of the jobs. Eventually we need to kill the execution of run_jobs after we are sure that indeed the jobs have all been executed on HPC.
This becomes quite problematic for blueprints that have dependencies, since the orchestrator fails to detect the the jobs status is COMPLETED and as a result, it does not submit any jobs that depend on the current jobs.
Cloudify Version 20.02.23~community (Community)
Croupier Version Commit 2abb8b6 of branch permedcoe