BorgwardtLab / proteinshake

Protein structure datasets for machine learning.
https://proteinshake.ai
BSD 3-Clause "New" or "Revised" License
101 stars 9 forks source link

think about data storage again #51

Closed timkucera closed 2 years ago

timkucera commented 2 years ago

currently json.gz, which is fine for the smaller datasets, but with atom-level resolution we might need more performant options. HDF5 or parquet maybe