Open darrnshn opened 9 years ago
including -- binary (hdf5?) run length encoding
multiple files for long chains?
I think multiple files is a good idea. I'm still not sure about binary vs text, since the more complicated the file format, the harder it is to read in other languages (e.g. if I'm using an R binding I would expect to easily read the file in R). If the format is too complicated then each language binding would have to provide its own chain file reading functions.
Embedded vs server: http://stackoverflow.com/questions/3108437/when-to-use-an-embedded-database
It's basically a toss up between a binary format server and an embedded binary DB. The only difference really is that the server will run in a separate process and the DB in a separate thread.
Embedded DBs:
Server DB:
Requirements:
Use run length encoding to reduce space. Maybe switch to binary format, or even use something compression library such as snappy (https://github.com/google/snappy).