Open lbluque opened 1 month ago
Attention: Patch coverage is 90.07353%
with 27 lines
in your changes missing coverage. Please review.
Files | Coverage Δ | |
---|---|---|
src/fairchem/core/modules/transforms.py | 55.17% <100.00%> (ø) |
|
src/fairchem/core/datasets/ase_datasets.py | 85.84% <87.50%> (-0.21%) |
:arrow_down: |
src/fairchem/core/trainers/base_trainer.py | 86.09% <93.10%> (+0.62%) |
:arrow_up: |
src/fairchem/core/trainers/ocp_trainer.py | 67.45% <89.47%> (+1.13%) |
:arrow_up: |
src/fairchem/core/common/distutils.py | 31.89% <62.50%> (+1.89%) |
:arrow_up: |
...m/core/modules/normalization/element_references.py | 94.87% <94.87%> (ø) |
|
...fairchem/core/modules/normalization/_load_utils.py | 81.08% <81.08%> (ø) |
|
.../fairchem/core/modules/normalization/normalizer.py | 91.30% <91.30%> (ø) |
@lbluque This is a massive lift! This PR is awesome! 💯 💯 💯 I am bookmarking this and striving for this kind of clarity and test coverage! ❤️ LGTM!
Can you clarify what the procedure is to update / re-reference linear references for a pre-trained model? Run the script, load the resulting config, and update the pre-trained model's config?
Can you clarify what the procedure is to update / re-reference linear references for a pre-trained model? Run the script, load the resulting config, and update the pre-trained model's config?
If you already have linear references fit. Then you can simply pass those as an npz file as follows:
element_references:
energy:
file: /path/to/linref.npz
And make sure you remove the lin_ref
keyword in the dataset section.
If you want to use normalization values you already computed you can keep that the same as in previous configs.
Any checkpoint that you save using this branch will then have the new linref and normalization modules.
Overall looks good — going to be a nice addition. A couple high level comments:
- It might be nice to offload some of code added to the trainer to utils or other files, to keep the trainers clean/easy to read.
- Do you know roughly the largest number of samples that can be processed OTF without hitting memory issues? Would be nice to to include an estimate.
- Are you planning to include a convergence plot of the OTF numbers (lin ref and norm) on a given dataset?
Thanks for the careful review @wood-b !
Have a look at the changes when you get a chance and let me know what you think!
This PR enables (on the fly) fitting and estimation of normalization values and element references
Normalizers
andLinearReference
modules are trainer attributes.added scripts to fit linear references and/or normalizers using the train dataset in a standard config (with fitting directive as specified above), i.e.
linear references can also be passed as a file in the dataset/transforms block (for example if fit with above script, or legacy npz files):
normalization values can also be passed from a file for many targets (the script above generates a dict with targets and normalizers):
or they can be passed by individual files (an npz or state_dict.pt with "mean" and "std")
using
lin_ref
for linear references inside datasets is still enabled for backwards compatibility.TODO: