IntelLabs / matsciml

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
MIT License
155 stars 25 forks source link

[Bug]: `Lattice` objects created are inconsistent with existing ones #304

Closed laserkelvin closed 1 month ago

laserkelvin commented 1 month ago

Expected behavior

The periodic properties calculation workflow currently creates a new pymatgen.core.Structure object, regardless of whether it exists or not, and based either on an existing lattice matrix or from parameters.

When using MaterialsProjectDataset and children, @melo-gonzo pointed out that the serialized structures are not identical to the ones produced by the matsciml workflow, which is not what we would expect as it then leads to a mismatch in input structures and labels.

Actual behavior

In a set of notebook tests, the image below shows the creation of a Lattice object from parameters, versus passing an existing Structure's lattice matrix directly: note the differences in the resulting matrix.

Screenshot from 2024-10-09 15-06-04

If you inspect the resulting structure's coordinates, nothing immediately stands out but if you visualize the structure, they are very different. This is a huge issue for force prediction, as the atoms are switched: the top panel below shows the matsciml emitted structure, and the lower panel is from the serialized structure.

Screenshot from 2024-10-09 15-10-03

As it turns out, at some point, pymatgen introduced a vesta argument to Lattice.from_parameters. The default value is False, but if set to True, the generated and serialized structures are then consistent.

Steps to reproduce the problem

Code snippets shown above; compare serialized structure with that produced by the PeriodicPropertiesTransform workflow.

Specifications

Mainly relevant here is pymatgen==2023.9.25