FAIR-Chem / fairchem

FAIR Chemistry's library of machine learning methods for chemistry
https://opencatalystproject.org/
Other
900 stars 262 forks source link

Errors with loading pretrained IS2RE ODAC models #892

Open pimdh opened 1 month ago

pimdh commented 1 month ago

Hi,

I have trouble loading the EquiformerV2-IS2RE-ODAC and Gemnet-OC-IS2RE-ODAC pretrained models.

With fairchem-core==1.2.0, if I run

model_name = "EquiformerV2-IS2RE-ODAC"
checkpoints = model_name_to_local_file(model_name, local_cache="/tmp/fairchem_checkpoints/")
calc = OCPCalculator(checkpoint_path=checkpoints, cpu=False)

I get error

TypeError: EquiformerV2.__init__() got an unexpected keyword argument 'norm_scale_nodes'

This is similar to what was noted in the closed issue #727. I can't find any version of EquiformerV2 that support this kwarg.

When I run

model_name = "Gemnet-OC-IS2RE-ODAC"
checkpoints = model_name_to_local_file(model_name, local_cache="/tmp/fairchem_checkpoints/")
calc = OCPCalculator(checkpoint_path=checkpoints, cpu=False)

I get error

ValueError: Scale file experimental/odac/configs/is2re/gemnet-oc.pt does not exist.

This is similar to what was noted in closed issue #603. If I remove the scale file from the config in the checkpoint (similar to as done in PR #625), and try to do predictions, I get errors like:

Scale factor int_blocks.3.quad_interaction.scale_sbf_sum is not fitted. Please make sure that you either (1) load a checkpoint with fitted scale factors, (2) explicitly load scale factors using the `model.scale_file` attribute, or (3) fit the scale factors using the `fit.py` script.

Thanks!

anuroopsriram commented 1 month ago

Hi: Our codebase changed significantly recently and this has been causing small issues. For Equiformer, I've updated the checkpoints in this PR: https://github.com/FAIR-Chem/fairchem/pull/893/files For Gement-OC, can you try using this scale file: configs/odac/s2ef/scaling_factors/gemnet-oc.pt

pimdh commented 1 month ago

Thanks a ton, @anuroopsriram! This solves the issue for EquiformerV2.

Unfortunately, for Gemnet, that scale file doesn't work. I try running on branch anuroopsriram-patch-1, commit 2788d9e9c4a36263. Get model with:

wget "https://dl.fbaipublicfiles.com/dac/checkpoints_20231018/Gemnet-OC_Direct.pt" -O /tmp/fairchem_checkpoints/Gemnet-OC_Direct.pt

Then run:

checkpoints = "/tmp/fairchem_checkpoints/Gemnet-OC_Direct.pt"
data = torch.load(checkpoints)
data["config"]["model_attributes"]["scale_file"] = "/opt/fairchem/configs/odac/s2ef/scaling_factors/gemnet-oc.pt"
torch.save(data, checkpoints)
calc = OCPCalculator(checkpoint_path=checkpoints, cpu=False)

Yields error

WARNING:root:Scale factor out_blocks.0.scale_rbf_F not found in model
WARNING:root:Scale factor out_blocks.1.scale_rbf_F not found in model
WARNING:root:Scale factor out_blocks.2.scale_rbf_F not found in model
WARNING:root:Scale factor out_blocks.3.scale_rbf_F not found in model
WARNING:root:Scale factor out_blocks.4.scale_rbf_F not found in model
...
ValueError: Scale factor parameter int_blocks.0.trip_interaction.scale_rbf.scale_factor is inconsistent with the loaded state dict.
Old: Parameter containing:
tensor(8.6465, device='cuda:0')
Actual: 0.0
github-actions[bot] commented 2 days ago

This issue has been marked as stale because it has been open for 30 days with no activity.