Hogfeldt / ctDNAtool

A software for creating and manipulating statistics from cfDNA data
4 stars 2 forks source link

Better file format for outputting data #2

Open Hogfeldt opened 4 years ago

Hogfeldt commented 4 years ago

Right now data is stored in the .npy or .pickle format with a corresponding index file. It could be nice to integrate the index file in a little wrapper class, that could contain both data and meta data.

The naive way for storing such an object is in the pickle format. My only concern with the pickle format is that it is not secure see documentation (https://docs.python.org/3.9/library/pickle.html).

Maybe we should consider finding an alternative? Maybe hdf5?

Hogfeldt commented 4 years ago

It's decided that we create a data abstraction that can handle IO and hold the data and meta data like indexes. For now we just use pickle to store this data abstraction, but in the long run it could be nice to create a safe mode, where data is stored in a hdf5 file or similar format.