NICTA / stateline

Distributed Markov Chain Monte Carlo
Other
28 stars 11 forks source link

Write chain outputs more efficiently #25

Open darrnshn opened 9 years ago

darrnshn commented 9 years ago

Use run length encoding to reduce space. Maybe switch to binary format, or even use something compression library such as snappy (https://github.com/google/snappy).

lmccalman commented 8 years ago

including -- binary (hdf5?) run length encoding

lmccalman commented 8 years ago

multiple files for long chains?

darrnshn commented 8 years ago

I think multiple files is a good idea. I'm still not sure about binary vs text, since the more complicated the file format, the harder it is to read in other languages (e.g. if I'm using an R binding I would expect to easily read the file in R). If the format is too complicated then each language binding would have to provide its own chain file reading functions.

darrnshn commented 7 years ago

Embedded vs server: http://stackoverflow.com/questions/3108437/when-to-use-an-embedded-database

It's basically a toss up between a binary format server and an embedded binary DB. The only difference really is that the server will run in a separate process and the DB in a separate thread.

Embedded DBs:

Server DB:

Requirements: