aiidateam / aiida-firecrest

AiiDA Transport/Scheduler plugins for interfacing with FirecREST (https://products.cscs.ch/firecrest/)
MIT License
2 stars 3 forks source link

Investigate the use case of slurm `sacct` #55

Open khsrali opened 3 months ago

khsrali commented 3 months ago

It seems aiida-core is using this to get the exit code in a complicated manner:

tasks.py and calcjob.py both expect a dictionary with three keys ('retval', 'stdout', 'stderr') from: scheduler.get_detailed_job_info() Which then calcjob.py uses it along with two other files to call again on scheduler.parse_output to get the exit code.

khsrali commented 3 months ago

Without this feature, monitoring cannot be done. For now, in case of a failure, not possible to know which of these caused it: ERROR_SCHEDULER_OUT_OF_MEMORY ERROR_SCHEDULER_OUT_OF_WALLTIME ERROR_SCHEDULER_NODE_FAILURE etc..