TheoChem-VU / TCutility

Utility functions/classes for the TheoCheM programs
https://theochem-vu.github.io/TCutility/
MIT License
5 stars 0 forks source link

Simple job running API #68

Closed YHordijk closed 8 months ago

YHordijk commented 11 months ago

Adding a simple job running API would be very helpfull for automation tasks. There are of course many possible ways of implementing this. API calls might look like:

from TCutility import runner, molecule

mol = molecule.load(...)
res = runner.optimize(mol, 'BLYP-D3(BJ)/TZ2P', charge=..., spinpol=..., solvent='water')  # returns a Result object
print(res.summary)

Or, a little more verbose, but more flexible:

from TCutility import runner, molecule

mol = molecule.load(...)
job = runner.Job()
job.functional = 'BLYP-D3(BJ)'
job.basisset = 'TZ2P'
job.quality = 'VeryGood'
job.charge = ...
job.spinpol = ...
job.solvent = 'water'
job.molecule = mol
job.task = 'GeometryOptimization'

job.settings.input.adf.print = 'FmatSFO'  # specify your own settings using the settings property of job

res = job.run()
print(res.summary)
YHordijk commented 9 months ago

Basic calculations

I have done some work on this in the yutility package . The approach I take is to make all settings be set using Job class methods. For example, we can run a simple geometry optimization job using the ADFJob class:

from tcutility.job import ADFJob

with ADFJob() as job:
    job.molecule('./test/xyz/NH3BH3.xyz')
    job.rundir = 'tmp/NH3BH3'
    job.name = 'GeometryOpt'
    job.sbatch(p='tc', ntasks_per_node=15)
    job.optimization()
    job.functional('r2SCAN')
    job.basis_set('TZ2P')

This small script will run a geometry optimization on the molecule stored in './test/xyz/NH3BH3.xyz' at the r2SCAN/TZ2P level of theory. It will be run and stored in './tmp/NH3BH3/GeometryOpt'. Furthermore, the job will be submitted using sbatch with 15 cores and in the tc partition.

The current approach allows us to very quickly set up a calculation in only 8 lines of code. Doing everything using plams would easily be close to 100 lines of code, including slurm settings.

Dependent jobs

One important feature of the approach is that we can define dependencies between jobs. For example, we can do an optimization at a lower level of theory and then do a single point calculation at a high level of theory.

from tcutility.jobs import ADFJob

with ADFJob() as opt_job:
    opt_job.molecule('./test/xyz/SN2_TS.xyz')
    opt_job.charge(-1)

    opt_job.rundir = 'tmp/SN2'
    opt_job.name = 'TS_OPT'
    opt_job.sbatch(p='tc', ntasks_per_node=15)
    opt_job.functional('OLYP')
    opt_job.basis_set('DZP')
    opt_job.transition_state()

with ADFJob() as sp_job:
    sp_job.dependency(opt_job)  # this job will only run when opt_job finishes
    sp_job.molecule(j(opt_job.workdir, 'output.xyz'))
    sp_job.charge(-1)

    sp_job.rundir = 'tmp/SN2'
    sp_job.name = 'SP_M062X'
    sp_job.sbatch(p='tc', ntasks_per_node=15)
    sp_job.functional('M06-2X')
    sp_job.basis_set('TZ2P')

sp_job will wait for opt_job to finish before starting. We can also directly take the molecule file that opt_job will produce. This file does not exist yet when sp_job is submitted, but it will be read when opt_job finishes.

Fragment calculations

One consequence of the dependency feature is the ease of setting up fragment base calculations. I have implemented a small class that implements this (ADFFragmentJob). For example, we can run a fragment analysis on the transition state of a radical addition reaction.

from tcutility.jobs import ADFFragmentJob

mol = plams.Molecule('./test/xyz/radadd.xyz')
with ADFFragmentJob() as job:
    job.add_fragment(mol.atoms[:15], 'Substrate')
    job.add_fragment(mol.atoms[15:], 'Radical')
    job.Radical.spin_polarization(1)
    job.rundir = 'tmp/RA'
    job.sbatch(p='tc', ntasks_per_node=15)
    job.functional('BLYP-D3(BJ)')
    job.basis_set('TZ2P')

We first load the molecule and then add the fragments by accessing the atoms from it. For this system, atoms 1-15 are part of the substrate and 16-20 is the methyl radical. Using the add_fragment method we can provide the fragment geometries (a list of atoms in this case) and the name of the fragment. The fragment jobs will also become an attribute of the ADFFragmentJob and are accessible by the fragment name. Therefore, we can easily change fragment settings by using the fragment name. For example, setting the spin_polarization of the radical fragment is as simple as calling job.Radical.spin_polarization(1). When exiting the context-manager and running the job, the total spinpol and charges will be calculated and set for the main job. The settings for the main job are also propagated to all child jobs, for example the functional and basis-sets. There will then be 3 jobs submitted, one for each fragment and one for the main calculations. The main calculation will then have a dependency on both fragment jobs.

SiebeLeDe commented 9 months ago

Looks good! Actually quite a handy way of doing calculations. I like how you can specify the name of the fragment and consequently use that name to change settings. I assume it's case-sensitive? I further wonder how the complex is called that you can use for accessing information? Or do you plan to just use the read function on the calculation directory?

The dependent job mechanism is very interesting. If it indeeds works as you would think then it becomes very easy to chain calculations such as for NMR, or entire workflows

YHordijk commented 9 months ago

1) The name is indeed case-sensitive, we simply use setattr to set the child job on the parent job. 2) For reading the EDA calculations I currently do not implement an easy way to read the results. The job is by default called complex. Perhaps it is a good idea to specify in the name if the job is a fragment or complex. 3) The dependency mechanism should indeed be very easy to use. To demonstrate, I created the NMRJob class which handles that for you. It first runs a DFT job @ SAOP/TZ2P and then creates an input and run-file for an NMR job.