QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
285 stars 135 forks source link

Nexus: Updated PBS job states for Polaris #4987

Closed kayahans closed 2 months ago

kayahans commented 2 months ago

Proposed changes

This PR updates the PBS scheduler available job states. Most important change from the older version PBS is the change of completed to finished. May 2020 version is installed in Polaris and it uses the new job state definitions. Without this update, it is not possible to use Nexus to submit jobs in Polaris because the state finished (F) is not defined. For compatibility in other computers still using the older version PBS, I have kept both completed and finished state tags.

For reference, please see page RG-199 at https://2021.help.altair.com/2021.1/PBSProfessional/PBSReferenceGuide2021.1.pdf.

Here is the information from man qstat in Polaris which has the matching information (select '1B' after man qstat):

The job's state:
                                                 B  Array job has at least one subjob
                                                    running
                                                 E  Job is exiting after having run
                                                 F  Job is finished
                                                 H  Job is held
                                                 M  Job was moved to another server
                                                 Q  Job is queued
                                                 R  Job is running
                                                 S  Job is suspended
                                                 T  Job is being moved to new location
                                                 U  Cycle-harvesting job is suspended
                                                    due to keyboard activity
                                                 W  Job is waiting for its submitter-
                                                    assigned start time to be reached
                                                 X  Subjob has completed execution or
                                                    has been deleted

New tags in >2020 PBS version are B, F (replaced from C), M, U, X.

Nexus only uses the 'complete' status to track the jobs, therefore the rest of the status tags are there for bookkeeping: https://github.com/QMCPACK/qmcpack/blob/1c17c0bb1f3a68690ccd466068495ae5de4ac3ff/nexus/lib/machines.py#L1307

Does this introduce a breaking change?

What systems has this change been tested on?

Polaris

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

ye-luo commented 2 months ago

Test this please