TheoChem-VU / TCutility

Utility functions/classes for the TheoCheM programs
https://theochem-vu.github.io/TCutility/
MIT License
6 stars 0 forks source link

Add job inheriting #183

Closed YHordijk closed 7 months ago

YHordijk commented 8 months ago

For larger workflows we can have many different jobs being set up, often with many of the same settings. For example:

from tcutility.job import ADFJob

molname = 'naked_allene'
functional = 'BP86'
basis_set = 'TZ2P'
with ADFJob() as opt_job:
    # common settings
    opt_job.slurm(p='tc', n=32)
    opt_job.rundir = f'calculations/{molname}/{functional}_{basis_set}'
    opt_job.functional(functional)
    opt_job.basis_set(basis_set)
    opt_job.quality('Good')
    opt_job.geometry_convergence('Good')
    opt_job.charge(0)
    opt_job.spin_polarization(1)

    # job-specific settings
    opt_job.name = 'optimization'
    opt_job.molecule(f'../xyz/{molname}.xyz')
    opt_job.optimization()
    ...

with ADFJob() as scan_job:
    # common settings
    scan_job.slurm(p='tc', n=32)
    scan_job.rundir = f'calculations/{molname}/{functional}_{basis_set}'
    scan_job.functional(functional)
    scan_job.basis_set(basis_set)
    scan_job.quality('Good')
    scan_job.geometry_convergence('Good')
    scan_job.charge(0)
    scan_job.spin_polarization(1)

    # job-specific settings
    scan_job.name = 'scan'
    scan_job.dependency(opt_job)
    scan_job.molecule(opt_job.output_mol_path)
    scan_job.PESScan(...)
    ...

with ADFJob() as tsopt_job:
    # common settings
    tsopt_job.slurm(p='tc', n=32)
    tsopt_job.rundir = f'calculations/{molname}/{functional}_{basis_set}'
    tsopt_job.functional(functional)
    tsopt_job.basis_set(basis_set)
    tsopt_job.quality('Good')
    tsopt_job.geometry_convergence('Good')
    tsopt_job.charge(0)
    tsopt_job.spin_polarization(1)

    # job-specific settings
    tsopt_job.name = 'tsopt'
    tsopt_job.dependency(scan_job)
    tsopt_job.molecule(scan_job.highest_energy_mol_path)
    tsopt_job.transition_state(...)
    ...

This workflow requires setting a lot of duplicate settings for the jobs. This increases the likelyhood of errors being made by the user when you want to change the common settings.

Instead we could do something as follows:

from tcutility.job import ADFJob

molname = 'naked_allene'
functional = 'BP86'
basis_set = 'TZ2P'

common_job = ADFJob()
common_job.slurm(p='tc', n=32)
common_job.rundir = f'calculations/{molname}/{functional}_{basis_set}'
common_job.functional(functional)
common_job.basis_set(basis_set)
common_job.quality('Good')
common_job.geometry_convergence('Good')
common_job.charge(0)
common_job.spin_polarization(1)

with ADFJob(common_job) as opt_job:
    # job-specific settings
    opt_job.name = 'optimization'
    opt_job.molecule(f'../xyz/{molname}.xyz')
    opt_job.optimization()
    ...

with ADFJob(common_job) as scan_job:
    # job-specific settings
    scan_job.name = 'scan'
    scan_job.dependency(opt_job)
    scan_job.molecule(opt_job.output_mol_path)
    scan_job.PESScan(...)
    ...

with ADFJob(common_job) as tsopt_job:
    # job-specific settings
    tsopt_job.name = 'tsopt'
    tsopt_job.dependency(scan_job)
    tsopt_job.molecule(scan_job.highest_energy_mol_path)
    tsopt_job.transition_state(...)
    ...

This allows us to make one common job and all other jobs can inherit from this job. This reduces the possibility of errors and removes many lines of code.

Possible implementation

1) In general, changing Job class settings can have side-effects, but they are limited to the object itself. It is therefore probably save to simply copy the __dict__ attribute of the common job and updating the __dict__ attribute of the target job. 2) We have to add a possitional argument to the Job constructor. I think it would be good to be able to inherit from multiple jobs. E.g.:

common = Job()
common.slurm(p='tc', n=32)
common.rundir = f'calculations/{molname}'
...

common_adf = ADFJob()
common_adf.functional('BP86')
common_adf.basis_set('TZ2P')
...

common_xtb = XTBJob()
common_xtb.model('GFN1-xTB')
common_xtb.solvent('chloroform')
...

with XTBJob(common, common_xtb) as preopt_job:
    preopt_job.molecule(...)
    preopt_job.optimization()
     ...
with ADFJob(common, common_adf) as opt_job:
    opt_job.dependency(preopt_job)
    opt_job.molecule(preopt_job.output_mol_path)
    opt_job.optimization()
     ...

In the constructor it would look like:

class Job:
    def __init__(self, *base_jobs, test_mode=..., ...):
        for base_job in base_jobs:
            self.__dict__.update(base_job.__dict__.copy())
        test_mode = test_mode
        ...

The jobs will be updated in order. This way you can choose which base-jobs take precedent.