all_geom - Githubissues

joegair commented 1 year ago

Does fr.all_geom access all geometries except the final geometry? The following ORCA job has 5 geometries and fr.all_geom accesses four of them.

! m062x 6-31+G** opt freq

* xyz 0 1
O   0.0000   0.0000   0.0000
H   1.0000   0.0000   0.0000
H   0.0000   1.0000   0.0000
*

This became confusing when parsing outputs of compound jobs, eg

* xyz 0 1
O   0.0000   0.0000   0.0000
H   1.0000   0.0000   0.0000
H   0.0000   1.0000   0.0000
*

%compound

New_Step
! m062x 6-31+G** opt freq # job1
Step_End

New_Step
! m062x 6-311+G** # job2
Step_End

New_Step
! b3lyp def2-tzvp # job3
Step_End
End

In this example:

fr = FileReader(path, get_all=True, just_geom=False)

ejob1 = fr.all_geom[-2][1]['energy']
ejob2 = fr.all_geom[-1][1]['energy']
ejob3 = fr['energy']

It would have helped me if the documentation said something like fr.all_geom returns the attributes of all geometries except the final geometry in the output.

(In the %compound job above, this is complicated by the fact that the last three single point energies in the file all correspond different calculations on the same geometry.)

ajs99778 commented 1 year ago

Yes, all_geom does not include the last iteration. I think it would be easy enough to change it to have all geometries.

We'd like to do more with the documentation. Whether we change it or keep it how it is, it should be explained better.

The file parsers are not written with compound jobs in mind. These are generally more difficult to parse and figure out what level of theory is being used to compute what. For example, in the job your provided you likely have fr["frequency"], even though the frequencies were computed for job 1. We did not want to spend time figuring out how compound job output files should be parsed for every QM software we support. Instead, we wanted to automate these types of jobs in a more software-agnostic way. For example:

from AaronTools.geometry import Geometry
from AaronTools.theory import Theory, OptimizationJob, FrequencyJob
from AaronTools.job_control import SubmitProcess

# different levels of theory to use
theory1 = Theory(
    method="M06-2X",
    basis="6-31+G**",
    job_type=[OptimizationJob(), FrequencyJob()],
    processors=4,
    memory=4,
)

theory2 = Theory(
    method="M06-2X",
    basis="6-311+G**",
    job_type="energy", # optional, as ORCA calculates energy by default
    processors=4,
    memory=4,
)

theory3 = Theory(
    method="B3LYP",
    basis="def2-TZVP",
    job_type="energy",
    processors=4,
    memory=4,
)

mol = Geometry.from_string("water", form="iupac")

# run optimization job and wait for it to finish
mol.write(outfile="opt-freq.inp", theory=theory1)
proc1 = SubmitProcess("opt-freq.inp", 2, 4, 4)
print("submitting opt+freq")
proc1.submit(wait=30)
print("opt+freq done")

# read the output structure and run single points
optimized_mol = Geometry("opt-freq.out")

print("submitting SPs")
optimized_mol.write(outfile="M06-2X_SP.inp", theory=theory2)
proc2 = SubmitProcess("M06-2X_SP.inp", 2, 4, 4)
proc2.submit()

optimized_mol.write(outfile="B3LYP_SP.inp", theory=theory3)
proc3 = SubmitProcess("B3LYP_SP.inp", 2, 4, 4)
proc3.submit()

This will run three separate jobs, one for each level of theory. The last two use the optimized structure from the first. There were plans to make this look less cumbersome, but I'm not sure where that's at.

swheele2 commented 1 year ago

I think all_geom should include the last geometry.

joegair commented 1 year ago

Thanks so much for providing the example. That helps a lot.

ajs99778 commented 1 year ago

I've added the final structure and data to fr.all_geom, but I also changed all_geom items from a list to a dictionary. So if you wanted to get all energies and structures, it could look like this:

for item in fr.all_geom:
    print(item["data"]["energy"])
    geom = Geometry(item["atoms"])

QChASM / AaronTools.py

all_geom #11