danielchalef / mrfparse

A Go parser for Transparency in Coverage MRF files.
Apache License 2.0
18 stars 8 forks source link

Add compression to split output #28

Open felix-hh opened 12 months ago

felix-hh commented 12 months ago

Hi - excellent tool! Its blazingly fast at tearing these large json files into digestible pieces for duckdb.

I'm working with the split output of mrfparse and the its size is one of my bottlenecks as I am using HDD to reduce costs (low speed for read/write, but extremely cheap). I would want to reduce the size of what I write to disk from memory using compression.

Is there support for this already that I have missed, or would it be possible to add it? I would look into it, it does not sound extremely complicated but I have never coded in Go.

danielchalef commented 12 months ago

Hi Felix, data in the Parquet files written by mrfparse are compressed using zstd compression. Note that I'm no longer maintaining this project and haven't been tracking any changes to the MRF format.