Open marangiop opened 3 years ago
This has happened again today, for the deployment
cycle_12_part2_greece_03_06
This happened again yesterday, for the deployments (these deployments were run at the same time in the night, using the same reservation)
cycle_12_part2_greece_05_06 cycle_12_part2_spain_05_06
Cloudify Version 20.02.23~community (Community)
Croupier Version Commit 1eb2f32 of branch grapevine, after merging from permedcoe branch at commit 46239ec
Describe the bug As mentioned in the title, this is a rare bug. This means that I normally run this blueprint every day without problems, but then on some specific occasion this error arises that essentially stops the execution of the entire blueprint at a specific job. The error consists in a Warning message
u'job_id'
(where job_id is the ID of the job assigned by croupier to a specific job of a blueprint) that is repeated all the time in a never-ending loop after the state of that job has changed to RUNNING. The warning message is repeated approximately every 15 seconds. To be precise, the error message is also always shown after the state of ever job changes to PENDING, but this particular case is not important because it does not stop the job to be executed.Buried in the Warning messages, there is an additional Warning message:
filedescriptor out of range in select()
. This warning message is issued only once or twice, then the above Warning messageu'job_id'
continues to be shown all the time.If I check the logs of that specific job inside the CESGA users portal, I see that the job has a COMPLETED status. This means that Croupier has correctly detected that the job has started being executed on HPC, but it has failed to detect that the job has finished.
To check the logs of the specifc deployment yourself Just enter the Cloudify instance deployed at http://cloudify.grapevine-project.eu/, then search for the deployment called
cycle_12_part2_greece_17_05
To Reproduce Steps to reproduce the behavior:
Expected behavior I don't mind if the warning message is shown up. Certainly I don't expect that a given deployment is stopped due to this warning message.