Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
MIT License
144
stars
20
forks
source link
[Feature request]: Refactor `MaterialsProjectDataset` to not serialize `pymatgen` `Structures` in LMDB #267
Currently, the workflow implemented for MaterialsProjectDataset will save and reload a pymatgen.Structure object. The issue with this is that it is very intimately tied to the version of pymatgen, where small API changes can make it difficult to reload the dataset in later versions.
Request attributes
[X] Would this be a refactor of existing code?
[ ] Does this proposal require new package dependencies?
[X] Would this change break backwards compatibility?
[ ] Does this proposal include a new model?
[ ] Does this proposal include a new dataset?
[ ] Does this proposal include a new task/workflow?
Related issues
No response
Solution description
If we can refactor it so that Structures are created at load time - in line with other dataset implementations - it would make it break this dependency...breaking.
We would have to re-process the existing LMDBs being distributed, and make sure that the data is stored as just plain coordinates, atoms, and lattice parameters.
Feature/behavior summary
Currently, the workflow implemented for
MaterialsProjectDataset
will save and reload apymatgen.Structure
object. The issue with this is that it is very intimately tied to the version ofpymatgen
, where small API changes can make it difficult to reload the dataset in later versions.Request attributes
Related issues
No response
Solution description
If we can refactor it so that
Structure
s are created at load time - in line with other dataset implementations - it would make it break this dependency...breaking.We would have to re-process the existing LMDBs being distributed, and make sure that the data is stored as just plain coordinates, atoms, and lattice parameters.
Additional notes
No response