CIRCL / pbtc

Passive Bitcoin Project
GNU Affero General Public License v3.0
10 stars 3 forks source link

Efficient storage of raw logs #1

Closed adulau closed 9 years ago

adulau commented 9 years ago

An efficient storage of raw logs could be implemented in order to store efficiently the logs. A 5-minutes format like the one used by nfdump[1]. A compression method like LZO could be used in order to limit the overall space of each 5-minutes file.

[1] https://github.com/vytautas/nfdump/blob/nfdist/bin/nffile.h

awfm9 commented 9 years ago

I'm looking at the benchmark from http://cyan4973.github.io/lz4/ to decide which compression algorithm to use, but in the end, we can adjust it to the actual throughput so we maximize the available resources.

There will be a size limit and time limit for the log files, so it will rotate each time either one of these is reached. Of course, you can disable one or the other if you want.

awfm9 commented 9 years ago

The log format for text records is now defined in the documentation:

https://github.com/CIRCL/pbtc/wiki/Text-Format

I will add the same thing for the binary format over the next few days. Both binary and text logs will be compressed with lz4 on log rotation, which shows compression rates from 25-32% in the initial tests for both log types.