Cecca / role-of-dimensionality

MIT License
3 stars 0 forks source link

Export the results to sqlite #14

Closed Cecca closed 3 years ago

Cecca commented 3 years ago

The previous caching mechanism was unreliable, the parquet datafile required to be fully loaded in memory, exporting took forever.

This commit tries to address all of the above issues by:

As a side effect this should fix #13 by making it obsolete

Cecca commented 3 years ago

Export time 1 hour for all the journal's experiments, size of the database 5.8 GB

Cecca commented 3 years ago

Yes! I think that we can get rid of caching now, which is just too unreliable: before moving to sqlite I had the export working correctly if I focused on just on dataset, and wrong if I exported all the datasets, with forced recomputation in both cases!

maumueller commented 3 years ago

Still would be interesting to know why this happened. Maybe it is an internal problem with hdf5 groups. Please merge if you are happy.