InfluenceFunctional / MXtalTools

BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Speed up dataset loading. #63

Closed InfluenceFunctional closed 7 months ago

InfluenceFunctional commented 1 year ago

It's quite extremely slow on the full dataset - surely there's something we can do here.

InfluenceFunctional commented 1 year ago

consider substituting pandas for polars

https://blog.jetbrains.com/dataspell/2023/08/polars-vs-pandas-what-s-the-difference/#:~:text=Pandas%2C%20by%20default%2C%20uses%20eager,way%20of%20executing%20the%20code.

InfluenceFunctional commented 1 year ago

may be some issues around less permissive data formats, e.g., pyarrow.lib.ArrowInvalid: Can only convert 1-dimensional array values

InfluenceFunctional commented 7 months ago

duplicate with issue #81