Xiaoxun-Gong / DeepH-E3

MIT License
60 stars 16 forks source link

Inquiry regarding the creation of overlap.h5 when using deeph_e3. #2

Closed choi-geunseok closed 1 year ago

choi-geunseok commented 1 year ago

I have experience using deeph and creating an overlap matrix. For testing purposes, I would like to obtain the Hamiltonian using deeph e3, with only the positions and types of atoms.

However, in the case of the deeph code, it preprocesses the overlap_0 generated from the Overlap-only-OpenMX to create overlap.h5. But the deeph_e3 code does not support this part. I would like to inquire about what I should do.

Xiaoxun-Gong commented 1 year ago

Hi, for your purpose, DeepH-pack has the functionality to combine all the overlaps_xx.h5 to a single overlaps.h5 file using the command deeph-inference. In the config inference.ini, you should set task=[1] and interface=openmx. OLP_dir should be the output folder of Overlap-only-OpenMX containing openmx.out and output/overlaps_xx.h5. After running the command, you will find all the necessary files under work_dir.

choi-geunseok commented 1 year ago

Thank you for your response. I have an additional question regarding the computation of the band structure. The deeph code took about 30 seconds to calculate the band structure at the inference stage, while the deeph e3 code took about 10 minutes. The method I used involved using sparse_calc.jl through deeph/inference_tools/deeph_band.sh, and I set num_tasks=3. Additionally, increasing the num_task doesn't seem to improve the speed significantly. I would like to inquire about how to resolve this.

Xiaoxun-Gong commented 1 year ago

The script sparse_calc.jl in the DeepH-E3 repository is obsolete because it's slow, and the sparse_calc.jl script in the DeepH-pack repository is the one that's most up to date. You can use the command deeph-inference in DeepH-pack with task=[5] in inference.ini to calculate the band structure from DeepH-E3 output because DeepH and DeepH-E3 use completely the same data format.

choi-geunseok commented 1 year ago

Thank you for your response. It seems to be working well for now.

JTaozhang commented 1 year ago

Hi, there,

I tried to use deeph-inferene in deepH-pack with task=[5] to predict band, but I meet an error "[ Info: read h5 [ Info: construct sparse matrix in the format of COO ERROR: LoadError: AssertionError: (site_norbits[atom_i], site_norbits[atom_j]) == size(hamiltonian_pred)". how to solve it? I have gotted the hamiltonians_pred.h5 through DeepH-E3. By the way, what is the meaning of band_i and band_f in the deep_band.sh, if I use sparse calculation, what I want is the band structure near the fermi-level, and to ensure the band structure includes the fermi-level, it is hard to define which band as an initial and which band as a final.

Xiaoxun-Gong commented 1 year ago

Hi,

Regarding the error occurred when diagonalizing the Hamiltonian, it might be caused by incorrect structure information. Please go to the folder containing hamiltonians_pred.h5, lat.dat and so on and check:

  1. Is "spinful" correct in info.json?
  2. Does orbital_types.dat match those of the structures in the training set?
  3. If these are correct, please check other files in the folder and make sure all the structure information is correct.

For the second question, deeph_band.sh is obsolete and should no longer be used. Please use deeph-inference command in DeepH-pack instead. The sparse matrix diagonalizing process in DeepH-pack (i.e., sparse_calc.jl) will read the information in band_config.json and find "num_band" eigenvalues closest to "fermi_level". There exists efficient algorithm to determine of the fermi level of large sparse matrices without diagonalizing the full spectrum, but currently it is not implemented in DeepH-pack or DeepH-E3. We are currently working on this, and the results will be published soon.

JTaozhang commented 1 year ago

Hi, Yes, I found the reason is that the info.json file was missed. after I supplement this file, then I normally finished the task=[5].

As you discussed with another researcher, we can use task=[5] to produce the band structure, I found a difference between this method with previous deepH, DeepH will produce a rh_pred.h5 file at task=[3], and then task=[4] will read this file. But the inference step of deepH-e3 only ouputed the hamitonians_pred.h5. I am not sure whether DeepH-E3 have done the task=[4] or not. Meanwhile, I notice deeph-e3 doesn't ouput rh_pred.h5, the rotation of hamitonian.

In total, although the training process was normally finished and most training parameters kept default, but when I use the hamiltonian_pred to calculate the band structure under the background of using the same band.json, the band structure is totally different with that calculated by deepH. After test, I am sure it is the hamiltonian_pred's problem. This is a headache for me. The training takes long time. could you give more explanation about the training paramter of deeph-e3 in the readme file? I think this would be helpful for freshman.

one more question I want to ask, when I use DeepH-e3 to do hamitonian prediction. the procedure was finished normally, but the error file reported some warnings, like this, e3nn/o3/_spherical_harmonics.py:82: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason. To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback To report the issue, try enable logging via setting the envvariable export PYTORCH_JIT_LOG_LEVEL=manager.cpp (Triggered internally at /opt/conda/conda-bld/pytorch_1659484809662/work/torch/csrc/jit/codegen/cuda/manager.cpp:237.) sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2]); Do those warnings affect the accuracy of predicted hamitonian? according to the error reports, I think it is because I use gpu to do prediction, it seemed that the gpu nodes didn't work.

how to estimate the training model is right?

Xiaoxun-Gong commented 1 year ago

Hi,

About the first question regarding rh.h5 and rh_pred.h5: as their names suggest, they store the Hamiltonian matrix blocks rotated from the global coordinate to the local coordinates (see DeepH article). Since DeepH-E3 completely gets rid of the local coordinates by applying equivariant networks, there is no need for rh.h5 or rh_pred.h5. Everything is expressed in the global coordinate.

About the question regarding trained model producing unexpected results without throwing any errors: to check whether the model is correct, the most straightforward method is to check the MSE and MAE of the model on the validation set or test set: you can look at the training output for the MSE, or go to test_result folder under training output directory and use deephe3-analyze.py. Usually a MAE about several meVs or MSE at the order of 10-6 to 10-7 meV2 will produce good results. Further, you can try to use your model to predict the Hamiltonian of some relatively small materials (it may or may not be in the training/test/validation set) and compare the corresponding band structure to the DFT result.

About the warning: it seems this warning is related to your system environment (especially pyorch and e3nn). Sometimes due to environment issues the code will only work on CPUs, not GPUs. You may try to install different versions of those packages or run the same job on a different machine and compare the system environment.

JTaozhang commented 1 year ago

Hi,

I have tried to use deephe3-analyze.py in the output directory, but this code report "No module named 'deephe3' ". And your "readme file" looks like that deeph-e3 doesn't need to be installed by pip. Maybe this is because I didn't find the right way to install it. I just directly use the command you mentioned in the usage part. could you tell me how to solve this problem?

secondly, I have checked the test_report.txt file, it reports that some target blocks have validation loss around 10e-3, other target blocks have validation loss 10e-7. 10e-7 appears most in the test_report.txt file.

About the warning, thanks you suggestion, I would try to change another version and test it.

Do you how to use plot_band.py file? As this file said, it can be used to calculate the fermi energy and plot the band structure use the *.Band file. However, when I use python plot_band.py in the output directory, which contains the openmx.Band and band.json files, I meet the error at line 390 of plot_band.py, like this KeyError: 'hsk_coords'.

Totally speaking, Many thanks to you.

JTaozhang commented 1 year ago

Hi,

I have tried to use deephe3-analyze.py in the output directory, but this code report "No module named 'deephe3' ". And your "readme file" looks like that deeph-e3 doesn't need to be installed by pip. Maybe this is because I didn't find the right way to install it. I just directly use the command you mentioned in the usage part. could you tell me how to solve this problem?

secondly, I have checked the test_report.txt file, it reports that some target blocks have validation loss around 10e-3, other target blocks have validation loss 10e-7. 10e-7 appears most in the test_report.txt file.

About the warning, thanks you suggestion, I would try to change another version and test it.

Do you how to use plot_band.py file? As this file said, it can be used to calculate the fermi energy and plot the band structure use the *.Band file. However, when I use python plot_band.py in the output directory, which contains the openmx.Band and band.json files, I meet the error at line 390 of plot_band.py, like this KeyError: 'hsk_coords'.

Totally speaking, Many thanks to you.

Xiaoxun-Gong commented 1 year ago

Hi,

About the module not found error: You should not copy deephe3-analyze.py to the training output directory. It is recommended that you go to the training output folder and directly execute the script in the original folder using its full path.

About the test_report.txt you have mentioned: Do you mean 1e-3 and 1e-7? An MSE error at the order of 1e-7 is good, but 1e-3 is too large. A common mistake that could cause such problem is that you included f orbitals (l=3) but did not increase spherical_harmonics_lmax and irreps_mid to include l=6 vectors. Maximum angular momentum of the neural network vectors should be at least twice the largest angular momentum of atomic orbital basis. Other possible reasons might include wrong training structures or erroneous DFT calculation in the training set preparation.

About plot_band.py: if you have openmx.Band file generated by OpenMX band structure calculation or sparse_calc.jl, you can directly use plot_band.py with from_json=False at line 23. If you have band.json file, you can use from_json=True. The reason of your error might be incorrect format of your band.json.

JTaozhang commented 1 year ago

Hi,

Many thanks for your explanation, I will test your suggestion later. As for the test_report.txt, yes I mean 1e-3 and 1e-7 and my training includes f orbital. I will test the program as you suggested.

thanks you again

JTaozhang commented 1 year ago

Hi, About the format of band.json, I have checked the format of my band.json. I didn't notice any difference with the example of deeph-E3 (located in inference_tools directory). I think this error may caused by other problem. image

Xiaoxun-Gong commented 1 year ago

You are using the wrong file. This band_config.json you have shown has nothing to do with plot_band.py. This is the config file for sparse_calc.jl. band.json for band plotting is generated by plot_band.py after reading OpenMX output (*.Band file).

JTaozhang commented 1 year ago

Hi,

Yes, you are right. I misused the band.json file. after I set from_json=False. I get the band structure.

Thanks a lot.