ORNL / HydraGNN

Distributed PyTorch implementation of multi-headed graph convolutional neural networks
BSD 3-Clause "New" or "Revised" License
68 stars 29 forks source link

Energy linear regression #269

Closed jychoi-hpc closed 3 months ago

jychoi-hpc commented 4 months ago

This is to add a script for energy linear regression.

jychoi-hpc commented 4 months ago

I am checking for all dataset. Please don't merge yet.

frobnitzem commented 4 months ago

I have an alternative implementation, which should be compared to be sure we are doing the same thing.

allaffa commented 4 months ago

I have an alternative implementation, which should be compared to be sure we are doing the same thing.

@frobnitzem Thanks. Please update this PR, providing your own alternative implementation, an Jong and I will compare the two implementations for consistency checks.

frobnitzem commented 4 months ago

I wouldn't subtract the mean energy - as this makes the energy model unphysical (E = E_0 + \sum_z c_z n_z). However, the results check out with the mean/covariance based computation:

$ python3 regression.py MPTrj.npz  | head -n5
# 1580227 samples, mean_E = -6.193764163751678, sigma_E = 1.847178354885813, sigma_fit = 0.6926824090165239
# Z c_Z m_Z sigma_Z
1 -3.4515251449541444 0.03201297273784663 0.11519684739878065
2 -0.25242553354522296 2.499640874380706e-05 0.00431143556356506
3 -3.131654896166989 0.026181752747806215 0.07941033289624111

$ python3 regression.py MPTrj-v2.npz  | head -n5
# 1580227 samples, mean_E = -5.234825150397457e-10, sigma_E = 0.6926824093226412, sigma_fit = 0.6926824093226411
# Z c_Z m_Z sigma_Z
1 -4.690768751132216e-09 0.0320129727378466 0.11519684739878065
2 -6.457415234849677e-08 2.4996408743807057e-05 0.004311435563565056
3 -3.145968856761228e-08 0.026181752747806188 0.07941033289624111

I do wonder why He (Z=2) is present in MPTrj.

allaffa commented 4 months ago

I wouldn't subtract the mean energy - as this makes the energy model unphysical (E = E_0 + \sum_z c_z n_z). However, the results check out with the mean/covariance based computation:

$ python3 regression.py MPTrj.npz  | head -n5
# 1580227 samples, mean_E = -6.193764163751678, sigma_E = 1.847178354885813, sigma_fit = 0.6926824090165239
# Z c_Z m_Z sigma_Z
1 -3.4515251449541444 0.03201297273784663 0.11519684739878065
2 -0.25242553354522296 2.499640874380706e-05 0.00431143556356506
3 -3.131654896166989 0.026181752747806215 0.07941033289624111

$ python3 regression.py MPTrj-v2.npz  | head -n5
# 1580227 samples, mean_E = -5.234825150397457e-10, sigma_E = 0.6926824093226412, sigma_fit = 0.6926824093226411
# Z c_Z m_Z sigma_Z
1 -4.690768751132216e-09 0.0320129727378466 0.11519684739878065
2 -6.457415234849677e-08 2.4996408743807057e-05 0.004311435563565056
3 -3.145968856761228e-08 0.026181752747806188 0.07941033289624111

I do wonder why He (Z=2) is present in MPTrj.

Could you retrieve which atomistic structures contains He ?

jychoi-hpc commented 3 months ago

I removed "mean(e)". If the histogram (shared in the Teams) looks ok, let's this branch is ready to merge.