adamfranco / curvature

Find roads that are the most curvy or twisty based on Open Street Map (OSM) data.
http://roadcurvature.com/
223 stars 37 forks source link

Streaming protocol #29

Closed Fonsan closed 8 years ago

Fonsan commented 8 years ago

@adamfranco What are your thoughts on moving to a .msgpack.gz standard from the current .msgpack standard. Here are some example results

3.9G    us-midwest-latest.msgpack
1.2G    us-midwest-latest.msgpack.gz
2.2G    us-northeast-latest.msgpack
721M    us-northeast-latest.msgpack.gz
 61M    us-pacific-latest.msgpack
 19M    us-pacific-latest.msgpack.gz
3.7G    us-west-latest.msgpack
1.2G    us-west-latest.msgpack.gz

We could easily abstract the reading and writing of gzipped messagepacked data into a file that would define our protocol and could be used throughout the project or it could be up to the user to add | gzip in their command chain and not take that design decision for them, I am leaning towards | gzip and keeping the current python calls directly to MessagePack but using | gzip and gunzip myself

Fonsan commented 8 years ago

Perhaps it would be enough to just add a tip in the readme since the msgpack protocol is a bit more verbose than protobuf

adamfranco commented 8 years ago

I agree with adding documentation to the README. As you can see in my default processing chain I have about 10 processing stages to stream the collections through -- gzipping/ungzipping each stage isn't needed and only useful before writing to disk.