Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
MIT License
155
stars
25
forks
source link
[Bug]: `Lattice` objects created are inconsistent with existing ones #304
The periodic properties calculation workflow currently creates a new pymatgen.core.Structure object, regardless of whether it exists or not, and based either on an existing lattice matrix or from parameters.
When using MaterialsProjectDataset and children, @melo-gonzo pointed out that the serialized structures are not identical to the ones produced by the matsciml workflow, which is not what we would expect as it then leads to a mismatch in input structures and labels.
Actual behavior
In a set of notebook tests, the image below shows the creation of a Lattice object from parameters, versus passing an existing Structure's lattice matrix directly: note the differences in the resulting matrix.
If you inspect the resulting structure's coordinates, nothing immediately stands out but if you visualize the structure, they are very different. This is a huge issue for force prediction, as the atoms are switched: the top panel below shows the matsciml emitted structure, and the lower panel is from the serialized structure.
As it turns out, at some point, pymatgen introduced a vesta argument to Lattice.from_parameters. The default value is False, but if set to True, the generated and serialized structures are then consistent.
Steps to reproduce the problem
Code snippets shown above; compare serialized structure with that produced by the PeriodicPropertiesTransform workflow.
Expected behavior
The periodic properties calculation workflow currently creates a new
pymatgen.core.Structure
object, regardless of whether it exists or not, and based either on an existing lattice matrix or from parameters.When using
MaterialsProjectDataset
and children, @melo-gonzo pointed out that the serialized structures are not identical to the ones produced by thematsciml
workflow, which is not what we would expect as it then leads to a mismatch in input structures and labels.Actual behavior
In a set of notebook tests, the image below shows the creation of a
Lattice
object from parameters, versus passing an existingStructure
's lattice matrix directly: note the differences in the resulting matrix.If you inspect the resulting structure's coordinates, nothing immediately stands out but if you visualize the structure, they are very different. This is a huge issue for force prediction, as the atoms are switched: the top panel below shows the
matsciml
emitted structure, and the lower panel is from the serialized structure.As it turns out, at some point,
pymatgen
introduced avesta
argument toLattice.from_parameters
. The default value isFalse
, but if set toTrue
, the generated and serialized structures are then consistent.Steps to reproduce the problem
Code snippets shown above; compare serialized structure with that produced by the
PeriodicPropertiesTransform
workflow.Specifications
Mainly relevant here is
pymatgen==2023.9.25