Minor improvements for generation of dataset of reaction profiles.

javialra97 commented 1 month ago

Reading the output.csv as a data frame (with pandas) can be problematic because of the first line, removing the line you will have a proper data frame with well-structured columns. Also, that information is printed in the methods.txt file.
For specific purposes, it is not necessary a final single point, add it as a boolean (set to True by default). Sort of related to pull request #330.
The convergence criterion for the optimization with Gaussian should be set to Tight.
Implemented bug #358.

Checklist

[ ] The changes include an associated explanation of how/why
[ ] Test pass
[ ] Documentation has been updated
[ ] Changelog has been updated

t-young31 commented 1 month ago

Thanks for the PR @javialra97

Reading the output.csv as a data frame (with pandas) can be problematic because of the first line, removing the line you will have a proper data frame with well-structured columns. Also, that information is printed in the methods.txt file.

This would be a breaking change, and pandas supports skipping rows natively

import pandas as pd
pd.read_csv("<filename>", skiprows=1)

so I'm not sure about that.

For specific purposes, it is not necessary a final single point, add it as a boolean

Interesting. I wouldn't be that adverse to this functionality, but can't think of a scenario where this would be useful. What did you have in mind?

The convergence criterion for the optimization with Gaussian should be set to Tight.

On this I don't agree I'm afraid. I've not come across a situation where tight optimisations are worthwhile. Maybe if you're wanting super accurate frequencies and you've got a load of compute to burn then maybe, in my view.

Implemented bug https://github.com/duartegroup/autodE/issues/358.

Thanks! 😄

javialra97 commented 1 month ago

Hi @t-young31

Interesting. I wouldn't be that adverse to this functionality, but can't think of a scenario where this would be useful. What did you have in mind?

I think the simplest case is when you have the resources to compute the full profile at a TZ/QZ basis set, in those cases a final single point is unnecessary. I have used it for benchmarking purposes, I want the geometries (with all the autodE checks) along the full profile to compute single points with N different functionals.

On this I don't agree I'm afraid. I've not come across a situation where tight optimisations are worthwhile. Maybe if you're wanting super accurate frequencies and you've got a load of compute to burn then maybe, in my view.

Well, I would prefer it to use the tight criterion, and I couldn't configure it with ade.Config, I will give it another try.

Thanks! 😄

Thanks to you for autodE!

t-young31 commented 3 weeks ago

Sorry for the slow reply – work took over last week

I think the simplest case is when you have the resources to compute the full profile at a TZ/QZ basis set, in those cases a final single point is unnecessary.

Cool – sounds good 👍🏼

Well, I would prefer it to use the tight criterion, and I couldn't configure it with ade.Config, I will give it another try.

Hopefully the second try was successful! The following I think should work

import autode as ade

ade.Config.G09.keywords.opt.remove('Opt')
ade.Config.G09.keywords.opt.append('Opt=Tight')
# likewise for ade.Config.G09.keywords.opt_ts

To get this merged would you mind

Reverting the changes to _ConfigClass and Reaction.print_output
Merging origin/v1.4.5 into your branch
Updating the contributors list and the changelog

Thanks 😄

javialra97 commented 3 weeks ago

Hi,

I made the pertinent changes. Hope everything is fine

Thanks!

t-young31 commented 3 weeks ago

Awesome – thanks @javialra97 😄

It looks like there's a conflict on your branch, but looks like an easy one to resolve

javialra97 commented 3 weeks ago

Hi @t-young31,

I hope that is solved now 🙂

t-young31 commented 2 weeks ago

Looks like some Guassian frequency calculation tests are failing due to allowing the Standard orientation coordinates to be extracted. I've not had time to dig into it but I'm guessing it's a limitation of the frequency calculation from the Hessian, making it unstable when the principal axis lies along one of the cartesian axes (I guess that's 'standard orientation'?). A quick fix might be to do

            if "Input orientation" in line or ("Standard orientation" in line and len(coords) == 0):

but we should fix this properly so that frequency calculation is okay irrespective of the orientation

javialra97 commented 2 weeks ago

Hi @t-young31,

In the Standard orientation, Gaussian reorders the molecule so the principal axis will be the z-axis and the principal plane of symmetry is located on the yz-plane. In the first test with the H2 molecule, the h2.hessian.atoms will look like as Atoms(n_atoms=2, [Atom(H, 0.0000, 0.0000, 0.3804), Atom(H, 0.0000, 0.0000, -0.3804)]) and not in the x-axis as is defined in the input.

As you said, the problem is in the hessian, the output of _tr_vecs() returns different vectors and consequently the _proj_mass_weighted has completely different values. Honestly, I can't help you any further at this point. I would suggest to rise a warning. Do the thermal corrections depend on that, because the extracted frequencies are fine?

t-young31 commented 2 weeks ago

Honestly, I can't help you any further at this point

Sorry you hit this 😞. Have created #362 to track outside of this PR

Do the thermal corrections depend on that, because the extracted frequencies are fine?

I'm not sure the frequencies are fine.. https://github.com/duartegroup/autodE/actions/runs/11589659415/job/32265741467#step:6:205

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.43%. Comparing base (8284a7c) to head (52aa45a). Report is 1 commits behind head on v1.4.5.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## v1.4.5 #359 +/- ## ======================================= Coverage 97.43% 97.43% ======================================= Files 204 204 Lines 23758 23759 +1 ======================================= + Hits 23148 23150 +2 + Misses 610 609 -1 ``` | [Flag](https://app.codecov.io/gh/duartegroup/autodE/pull/359/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/duartegroup/autodE/pull/359/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup) | `97.43% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

javialra97 commented 2 weeks ago

I'm not sure the frequencies are fine.. https://github.com/duartegroup/autodE/actions/runs/11589659415/job/32265741467#step:6:205

Yes, is true, I only checked the one for H2 but the acetylene frequencies are wrong 😔

Maybe instead of taking the geometries from the standard orientation, we can take them directly from the input instructions, after charge and multiplicity, what do you think?

The error now is also because of taking a different geometry?

t-young31 commented 2 weeks ago

Maybe instead of taking the geometries from the standard orientation, we can take them directly from the input instructions, after charge and multiplicity, what do you think?

We could, but then it'd still potentially be broken for Opt+Freq calculations that don't print the input orientation coordinates. I'll find a way to fix it properly, or at least try!

The error now is also because of taking a different geometry?

I don't think so. @shoubhikraj have you got any ideas? it's Windows+TRM so your world. Python 3.8 looks like it's just end of life, so shall we just drop support? @javialra97 very happy to merge irrespective of that failing test, if you're happy 😄

javialra97 commented 2 weeks ago

We could, but then it'd still potentially be broken for Opt+Freq calculations that don't print the input orientation coordinates. I'll find a way to fix it properly, or at least try!

It seems that Gaussian is a little messy with the outputs. But I was checking my outputs and the one that you have in test and for the Opt+Freq of the TS it prints it

@javialra97 very happy to merge irrespective of that failing test, if you're happy 😄

Great! Glad to help a little bit.

shoubhikraj commented 2 weeks ago

@t-young31 Honestly I am not sure where the error is coming from... The hessian and gradient are loaded from stored matrices, and I have previously checked that it works. It does succeed with python 3.13, so maybe some issue with numpy or scipy versions. If it comes up later, then I might check in detail. I am fine with removing python 3.8 support, maybe in the next version - it was just made eol.

duartegroup / autodE

Minor improvements for generation of dataset of reaction profiles. #359

Checklist

Codecov Report