IntelLabs / matsciml

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
MIT License
144 stars 20 forks source link

LMDB traversal cli #301

Closed laserkelvin closed 2 weeks ago

laserkelvin commented 2 weeks ago

This PR adds a big QoL oriented CLI, which provides some high level functionality for inspecting LMDB datasets.

laserkelvin commented 2 weeks ago

So window size is used by the running average, so as you're iterating through the dataset it will do (by default) a running average of properties based on 10 of the last samples. It's different from just capping the number of samples to go through, because you might want to sweep through the data and look for outliers.