TREX-CoE / trexio_tools

Set of tools for trexio files
BSD 3-Clause "New" or "Revised" License
18 stars 8 forks source link

Orca to trexio #22

Closed v1j4y closed 1 year ago

v1j4y commented 1 year ago

Added everything except ECPs.

Converter uses filename.json orca data repository created using orca_2json to obtain the corresponding trexio_file.h5 file. Note that the orca_2json is the long term supported data repository generator for ORCA.

v1j4y commented 1 year ago

@scemama what is the recommended way to test the converter ?

scemama commented 1 year ago

To test the produced TREXIO files, the best solution is to run:

trexio check-mos  FILE.trexio

It will compute numerically the overlap of MOs, and if your MOs are orthonormal it is easy to see that everything is OK. With the -n option, you can increase the number of grid points. If the calculation takes too much time, you can install QMCkl (make python-install in qmckl) to accelerate the calculation :-)

scemama commented 1 year ago

You should also run

trexio check-basis

to compare the overlap matrix in the trexio file with a numerically computed one. If they match, it means that all the conventions and normalization factors are OK. Because it is possible that a normalization factor is missing in the AOs and that you include it inside the MO coefficients, so that the MO check doesn't detect the problem.

q-posev commented 1 year ago

Amazing @v1j4y, thanks!

If you have some test files, you can upload them to the data folder of this repo and we can then modify the CI to test ORCA converter.

q-posev commented 1 year ago

I see a lot of comments referring to PySCF. Is it because ORCA's notations are similar to PySCF?

v1j4y commented 1 year ago

OK thanks for the comments !

Currently, there's a problem with normalization of AOs which was revealed by trexio check-mos, I'm fixing this atm.

v1j4y commented 1 year ago

OK ! Just FYI, everything was ok except that the coordinates had to be given in Bohrs ! This fixes the normalization issues with AOs.

Also, Dagmar Lenk has implemented ECPs in orca_2json so I'll also add it here. :wink:

scemama commented 1 year ago

So we wait for ECPs before we merge, or you make another PR?

v1j4y commented 1 year ago

Yes ! I'm implementing ECPs following the pyscf to trexio functions by kosuke. I'll tell you when it's done.

v1j4y commented 1 year ago

@scemama I've implemented ECPs. Is there any way to check for ECPs ?

Also, I'm testing the water molecule which was used for the pyscf test by Kosuke.

Here's the output (I changed the integration domain to [40.,40.,40.]):

trexio check-basis h2o -b TEXT --n_points 500
Integration steps: [0.037033353988351204, 0.03301131648479499, 0.035256541967839575]

  0   0        0.999923        1.000000
  0   1       -0.344015       -0.344017
  0   2       -0.157869       -0.157870
  0   3       -0.000000       -0.000000
  0   4       -0.000000        0.000000
  0   5        0.000000       -0.000000
  0   6       -0.000000        0.000000
  0   7       -0.000000       -0.000000
  0   8        0.000000        0.000000
  0   9       -0.047843       -0.047844
  0  10       -0.062517       -0.062518
  0  11       -0.047843       -0.047844
  0  12       -0.062517       -0.062518
  1   1        1.000000        1.000000
  1   2        0.788941        0.788941
  1   3       -0.000000       -0.000000
  1   4       -0.000000        0.000000
  1   5        0.000000        0.000000
  1   6        0.000000        0.000000
  1   7       -0.000000        0.000000
  1   8        0.000000        0.000000
  1   9        0.329268        0.329268
  1  10        0.391443        0.391443
  1  11        0.329268        0.329268
  1  12        0.391443        0.391443
  2   2        1.000000        1.000000
  2   3        0.000000       -0.000000
  2   4       -0.000000       -0.000000
  2   5        0.000000        0.000000
  2   6        0.000000        0.000000
  2   7       -0.000000        0.000000
  2   8       -0.000000        0.000000
  2   9        0.511019        0.511019
  2  10        0.690325        0.690325
  2  11        0.511019        0.511018
  2  12        0.690325        0.690325
  3   3        0.177129        1.000000
  3   4       -0.000000        0.000000
  3   5       -0.000000        0.000000
  3   6        0.219506        0.502141
  3   7       -0.000000        0.000000
  3   8       -0.000000        0.000000
  3   9       -0.126986       -0.025610
  3  10       -0.047951       -0.009642
  3  11        0.051811       -0.247551
  3  12        0.019564       -0.093205
  4   4        0.177129        1.000000
  4   5       -0.000000        0.000000
  4   6       -0.000000        0.000000
  4   7        0.219506        0.502141
  4   8       -0.000000        0.000000
  4   9       -0.027811       -0.273671
  4  10       -0.010502       -0.103040
  4  11       -0.034081        0.111660
  4  12       -0.012869        0.042041
  5   5        0.177129        1.000000
  5   6       -0.000000        0.000000
  5   7       -0.000000        0.000000
  5   8        0.219506        0.502141
  5   9       -0.011883       -0.059936
  5  10       -0.004487       -0.022567
  5  11       -0.114866       -0.073448
  5  12       -0.043374       -0.027654
  6   6        0.904436        1.000000
  6   7       -0.000000        0.000000
  6   8        0.000000        0.000000
  6   9       -0.575744       -0.056653
  6  10       -0.361553       -0.035577
  6  11        0.234908       -0.547616
  6  12        0.147517       -0.343890
  7   7        0.904436        1.000000
  7   8        0.000000        0.000000
  7   9       -0.126093       -0.605398
  7  10       -0.079183       -0.380175
  7  11       -0.154519        0.247007
  7  12       -0.097034        0.155115
  8   8        0.904436        1.000000
  8   9       -0.053878       -0.132588
  8  10       -0.033834       -0.083262
  8  11       -0.520793       -0.162478
  8  12       -0.327045       -0.102032
  9   9        1.000000        1.000000
  9  10        0.684800        0.684800
  9  11        0.120869        0.120869
  9  12        0.307324        0.307324
 10  10        1.000000        1.000000
 10  11        0.307324        0.307324
 10  12        0.606683        0.606683
 11  11        1.000000        1.000000
 11  12        0.684800        0.684800
 12  12        1.000000        1.000000
Norm of the error: 2.624232

Does this look OK or does it need more points ? Is the error big ?

scemama commented 1 year ago

@scemama I've implemented ECPs. Is there any way to check for ECPs ?

No. The simplest way is to import the TREXIO file in another code, and check that the energy is correct. You can do it with quantum package with qp_import_trexio , and the qp run print_energy.

Does this look OK or does it need more points ? Is the error big ?

It is hard to tell... If you install the python interface to QMCkl, trexio-tools will use it to compute the AOs and MOs, so it will be orders of magnitude faster. Then, you can increase the number of points for the integration, ans see if it converges towards an identity matrix. But you can also compute the energy with quantum package to cross-check ;-)

v1j4y commented 1 year ago

Hi !

@scemama I'm almost done, I get the same energies as orca after doing qp_import_trexio , and then qp run print_energy for files without ECPs. I've also added an orca file to test the conversion h2o.json. I'm now testing the ECPs with orca.

@q-posev Do I need to do anything else for the CI ? How do I set up the CI ?

q-posev commented 1 year ago

Bravo @v1j4y !

The CI is already in place, see test.yml file. As I wrote above, for the CI we would need you to upload one (or a few) test ORCA JSON file(s) to the data folder of this repo and then we can add the corresponding test lines (calling trexio convert-from -t orca ...) to the CI. Unless there is a licensing issue which does not allow storage of ORCA JSON files in the open source repos.

For now we only test that converters "work" (i.e. they do not raise errors while processing output of a q-chem program and they do produce the TREXIO file in the end). The validity of the TREXIO file is only checked for GAMESS for now.

@scemama this is something related to non-regression of the converters which worries me sometimes (see my comments in #12), perhaps at some point we want to make sure that produced TREXIO files are valid and have not changed drastically compared to some fixed ref. We could do h5diff on TREXIO HDF5 files or we could do check-basis and check-mos in the CI for all converters. What do you think?

@v1j4y Let me know if you need help with the CI.

scemama commented 1 year ago

@scemama this is something related to non-regression of the converters which worries me sometimes (see my comments in https://github.com/TREX-CoE/trexio_tools/issues/12), perhaps at some point we want to make sure that produced TREXIO files are valid and have not changed drastically compared to some fixed ref. We could do h5diff on TREXIO HDF5 files or we could do check-basis and check-mos in the CI for all converters. What do you think?

@q-posev I think it is a good idea to check that the files are OK. But I would use the text backend for that with diff, as text files are better handled with git.

q-posev commented 1 year ago

I'm adding the ORCA test to CI on master.

v1j4y commented 1 year ago

Thanks @q-posev !