[Feature request]: Homogenization of data structures and physical representations

Feature/behavior summary

To ensure consistency in modeling, each dataset in Open MatSciML Toolkit should have uniform (or near uniform) kinds of data. For example, whether coordinates provided are fractional or Cartesian, ensuring every dataset has sufficient information to represent each data sample in a physically meaningful way, such as periodic boundary conditions (for use in e.g. shift vectors).

Request attributes

[X] Would this be a refactor of existing code?
[ ] Does this proposal require new package dependencies?
[ ] Would this change break backwards compatibility?
[ ] Does this proposal include a new model?
[ ] Does this proposal include a new dataset?
[ ] Does this proposal include a new task/workflow?

Related issues

No response

Solution description

A good place to start would be to make sure each devset, and subsequently any serialized datasets we have conform to the following:

Check if the coordinates are fractional or not (if there are values outside of 0 and 1 then they're likely Cartesian)
Check to make sure we have enough information to create a Lattice object, can be just a cell key, or have the lattice parameters like materials project
Generally just print and list out the keys in the sample, construct a table of them, so that we can help contribute to #97

We should also check other projects, like Colabfit, to see what extent we can try and conform to community standards, too.

Additional notes

Can't assign Bin yet, but would be good for Bin to aggregate information, and between him and @melo-gonzo to help craft PRs to address things after the survey is done.

IntelLabs / matsciml