Luthaf / rascaline

Computing representations for atomistic machine learning
https://luthaf.fr/rascaline/
BSD 3-Clause "New" or "Revised" License
44 stars 13 forks source link

Add a tutorial about "how to get fixed size output from rascaline" #310

Open Luthaf opened 3 months ago

Luthaf commented 3 months ago

Rascaline dynamically computes a lot of things depending on the systems it receives as input. When dealing with SOAP features (or anything else based on one-hot encoding of atomic species), this means that the shape of the output depends on the input.

This might be an issue in some cases, since some ML framework really want to know all the shape ahead of time; and when running inference some species might be missing in the dataset. For this reason, we have a whole mechanism for keys & property selection, that allow to fix the shape of the output regardless of the input. While these are documented on their own, we are missing something explaining the issue and corresponding solutions in the documentation.