ReactionMechanismGenerator / ARC

ARC - Automatic Rate Calculator
https://reactionmechanismgenerator.github.io/ARC/index.html
MIT License
42 stars 21 forks source link

Azure server #689

Open calvinp0 opened 11 months ago

calvinp0 commented 11 months ago

Due to acquiring a subscription to Azure, we need to update ARC to allow for the usage of Azure as well as ESS like QCHEM. Please see the list below of all the changes. NOTE: Tests still need to be fixed or written for these new changes

  1. Added 'rapidfuzz' to ARC environment
  2. basis set dataframe for QChem
  3. check_normal_mode_displacement - Allow for a try and except case
  4. Adapter: Submit Script Format updated with using user provided username {un}, also convert memory to an 'int', and also provide the option of {server_nodes} if required
  5. Adapter: Total Submit Memory adjusted to now ensure that when troubleshooting a job, it never attempts to go OVER the maximum memory of the allowed submission memory of the node/server
  6. Adapter: SLURM Submit Memory - Using #SBATCH --mem as the parameter now as it defines the TOTAL memory of the submission
  7. Adapter: SSH File Download - We do not expect to always download or upload certain files depending on the scheduler via SSH. This change allows for recognising if certain files will be uploaded or downloaded depending on the user's scheduler choice
  8. Adapter: Null Bytes can appear in files, or rather more specifically, if QChem has an error, it can produce Null Bytes in the out.txt/err.txt files and thus requires a different type of reading of the file contents. This is not 100% full proof though and may need extra work
  9. Adapters: In #390 branch, SSH had improvements but were not merged. I have brought forth an improvement from this branch were Remote Files are removed once they are download to the local computer
  10. ARC can now recognise that IRC is also supported by QChem
  11. common.py: Slight reworking of the trouble shooting of the scan resolution so that it once the obj.scan_res has been defined with the new rotor scan resolution, the scan res is removed from the obj.args['trsh']['scan_res'] properly
  12. SSH: Updated decorator to use the correct connect function
  13. SSH: If the user provides a server that is not in servers.keys() and server is also not None, then an error is raised to informat the user that they need to fix up the server keys
  14. SSH: An error that can occur is when a submission to a scheduler includes an incorrect memory specification, then there is warning to the user that the requested memory is not supported and needs to be checked. May need to make this a ValueError instead of a logger warning
  15. SSH: Slight adjustment to checking if there is an stdout after submission attempt
  16. SSH: Some servers require private keys. Originally the code was incorrectly adding the private key to the SSH paramiko function. It has now been changed so that system keys are loaded and then if the user provides a private key, it is included in the connect function
  17. SSH: Updated default arguments to get_last_modified_time
  18. SSH: Changed the lowering of the cluster soft
  19. SSH: Added a function to remove the directory on the remote server
  20. SSH: Azure SLURM has an extra status called 'CF' which means configuring (for the node). This can take 10-15 mins or so before the node is online. We now ensure to caputre this. HOWEVER, a node can get stuck in 'CF' status. Now we check this via checking the current time the node has been active, splitting the time up correctly (different formats of time are possible), and then if it is above 15 minutes, we run the command scontrol show node {node_id}. If the stdout includes the phrase 'NOT_RESPONDING' then we return 'errored'
  21. XYZ to Smiles: Warning Update to Possible Valence
  22. Vectors: Reading Coords that are in string format using regex and putting into a tuple
  23. Species: The original code did not functional correctly - nor was never used hence why it was passed into production. It has now been changed to properly return the actual number of heavy atoms.
  24. [WIP] Getting the Internal Coordinates for QChem - Still not operational
  25. submit.py & settings.py: Updated for SLURM and AZURE
  26. Scheduler: Now imports JobError
  27. Scheduler: Fixed adding trsh to the args
  28. Scheduler: Added JobError exception for determining job status
  29. Scheduler: Now removing remote jobs at the end of the scheduler - !!!MAY NEED TO SEE IF THIS HAS AN EFFECT ON NON SSH JOBS!!!
  30. Scheduler: Getting the recent opt job name via properly checking if the opt job was considered done (This was not done before)
  31. Scheduler: TODO - We attempt to trouble shoot a frequency we deem not okay. Yet, there is no specific troubleshoot method, so why do we do this?
  32. Scheduler: Properly troubleshoot an job
  33. Scheduler: Fix conformer troubleshoot if it was a TS conformer
  34. parser: TODO - Why do we set raise error as true for normal mode displacement parsing? It has an effect on the function of raising a not implemented error even though it is implememnt
  35. parser: Now can parse the normal mode displacement of QCHEM
  36. parser: Now can parse the 1d scan coords of QCHEM
  37. parser: Can now parse trajectory of QCHEM
  38. parser: Can now parse arguments in the scan input file QCHEM
  39. parser: NEED to fix parse_ic_info for QCHEM
  40. QChem Adapter: Import - Pandas, ARC_PATH, rapidfuzz
  41. QChem Adapter: Input Template now supports IRC and {trsh} args and ensures IQMOL_FCHK is false (This can be set to true BUT be aware this .fchk file can be rather large)
  42. QChem Adapter: write_input_file - basis set is now matched via the software_input_matching function
  43. QChem Adapter: write_input_file - QChem now supports D3 method. We should look at working with other DFT_D methods in the future. More specifically there are other types of D3 methods
  44. QChem Adapter: write_input_file - Correctly pass in troubleshooting arguments into the input file
  45. QChem Adapter: write_input_file - Capitalised TRUE/FALSE in UNRESTRICTED parameter
  46. QChem Adapter: write_input_file - Removed the scan job type and moved it to another section of the input file writing
  47. QChem Adapter: write_input_file - If scan is set, the job_type is PES_SCAN. We also set the fine to be XC_GRID 3. However, we may need to look into changing the tolerances later
  48. QChem Adapter: write_input_file - We now write correctly the torsional scans for the input file for a scan
  49. QChem Adapter: write_input_file - IRC is now supported, however this input file means we run two jobs from the one input file - A FREQ job and then IRC. This currently works but will need improvements when used more by users
  50. QChem Adapter: write_input_file - Ensuring that the SCF CONVERGENCE is 10^-8 for certain job types
  51. QChem Adapter: [NEWFUNCTION] generate_scan_angles - to support PES SCAN jobs, we have a function that will look at what the required angle we want to scan, and the step size, and then return a start and end angle between -180, 180 that will ensure we scan the require angle during the stepping
  52. QChem Adapter: [NEWFUCNTION] software_input_matching - Since QCHEM has different formatting for basis sets, this function will try take the users format of the basis set and match it against a dataframe (which should always be updated if its missing a format). This uses the new package in the ARC environment called rapidfuzz
  53. TrshQChem: Fixed error checking in QChem output files. It would originally mistakenly think SCF failed was the error due to what errors it would look for in the lines
  54. TrshQChem: FlexNet Licensing Error - If the license server is not working this will cause ARC to stop
  55. TrshQChem: Max Optimisation Cycles is probably checked for in the output file
  56. TrshQChem: Max Iteration Cycles is identified now if there is a failure during SCF convergence
  57. TrshMolpro: Molpro reports memory errors that need to be properly troubleshooted differently than what we did originally. Now, we will look for how much memory needs to be increased in order for molpro to run successfully. This is done through regex pattern matching. We also check for triples memory increase if required
  58. Trsh: determine_job_log_memory_issue - Sometimes the job log can have null bytes in them, usually a QCHEM issue, and so this means we need to open the file to read differently
  59. TrshQChem: trsh_ess_job - QCHEMs trsh has been reworked so now that it will combine troubleshoot attempts if they were attempted previously. For example, if we troubleshooted the max opt cycle but now need to turn on SYM IGNORE, it will include both of these statements in the troubleshooting input file
  60. TrshQMolpro: trsh_ess_job - Molpro required a chnage in how we troubleshoot the memory. If we get an error for the memory it is because either the MWords per process is not enough, even though we have provided an adequate about of memory to the submit script OR the MWords per process is enough but the TOTAL MEMORY (MWords * CPUS) > Max Node Memory, therefore CORES has to be reduced.
  61. TrshSSH:trsh_job_on_server - Fixed it as a 'with' statement so the client is closed when exiting the 'with' statement
  62. Molpro Adapter: Molpro needs a different touch to troubleshooting its memory. Here in setting the input file memory we determine if the MWords was enough per process but the total memory was too high. If that's the case, we reduce the processes req. while maintaining the memory per process
codecov[bot] commented 11 months ago

Codecov Report

Attention: Patch coverage is 39.50233% with 389 lines in your changes are missing coverage. Please review.

Project coverage is 72.55%. Comparing base (be1f6c8) to head (a3ff0fb).

:exclamation: Current head a3ff0fb differs from pull request most recent head 123279e. Consider uploading reports for the commit 123279e to get more accurate results

Files Patch % Lines
arc/parser.py 28.46% 89 Missing and 4 partials :warning:
arc/job/trsh.py 38.93% 69 Missing and 11 partials :warning:
arc/job/adapters/qchem.py 39.04% 53 Missing and 11 partials :warning:
arc/job/ssh.py 15.68% 39 Missing and 4 partials :warning:
arc/scheduler.py 0.00% 32 Missing :warning:
arc/job/adapter.py 15.62% 27 Missing :warning:
arc/job/adapters/molpro.py 13.63% 17 Missing and 2 partials :warning:
arc/species/converter.py 13.33% 12 Missing and 1 partial :warning:
arc/species/vectors.py 14.28% 5 Missing and 1 partial :warning:
arc/level.py 42.85% 3 Missing and 1 partial :warning:
... and 3 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #689 +/- ## ========================================== - Coverage 73.82% 72.55% -1.27% ========================================== Files 99 100 +1 Lines 27346 26999 -347 Branches 5717 5668 -49 ========================================== - Hits 20187 19590 -597 - Misses 5733 6042 +309 + Partials 1426 1367 -59 ``` | [Flag](https://app.codecov.io/gh/ReactionMechanismGenerator/ARC/pull/689/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ReactionMechanismGenerator) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/ReactionMechanismGenerator/ARC/pull/689/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ReactionMechanismGenerator) | `72.55% <39.50%> (-1.27%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ReactionMechanismGenerator#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.