GRAPLE / GWS

Graple Web Service
3 stars 5 forks source link

Communicate held jobs to the user #10

Open JaikrishnaTS opened 7 years ago

JaikrishnaTS commented 7 years ago

The jobs that are held by condor due to issues need to be communicated to the client with the status message/email. Available options are allowing the user to cancel the other jobs and return with the logs (in case of a power user/debug enabled); cancel the held jobs (from condor) with a descriptive file in the result about them and proceed with other jobs.

Also, the jobs that are directly managed through condor - held, stopped etc, don't propagate their status to the DB and EMS keeps querying them over and over leading to a performance issue. Without maintenance clearing of the DB, this leads to condor_history using lots of CPU. Solving the above issue needs to be done in a way that this one is avoided. This particular issue could be fixed by modifying https://github.com/GRAPLE/GWS/blob/master/ems.py#L140 process_once function to also account for held jobs (make up a new experiment status - 'held'/'error').