adrianulbona / osm-parquetizer

A converter for the OSM PBFs to Parquet files
http://adrianulbona.github.io/2016/12/18/osm-parquetizer.html
Apache License 2.0
92 stars 32 forks source link

Ignore visible info when converted to Parquet file #10

Closed ericsun95 closed 4 years ago

ericsun95 commented 4 years ago

Though the parquetizer follows the v0_6 schema (which seems not containing the visible) from osmosis, it will cause the output files ignore the visible info if orignial PBF file contains.

Are there any fast ways for us to also include it in the output?

adrianulbona commented 4 years ago

Hmm, interesting question. When I designed initially the parquetizer, I had in mind converting only the latest version of entities (considering that everything in the PBF is visible and there are no multiple versions of the same entity). If we can get the information somehow in osmosis while reading the files, I see no problem to extend the existing schemas with an extra field.

ericsun95 commented 4 years ago

Hmm, interesting question. When I designed initially the parquetizer, I had in mind converting only the latest version of entities (considering that everything in the PBF is visible and there are no multiple versions of the same entity). If we can get the information somehow in osmosis while reading the files, I see no problem to extend the existing schemas with an extra field.

I do find this https://github.com/openstreetmap/osmosis/blob/2219470cef1f73f5d1319c57149c84b398e767ce/osmosis-apidb/src/main/java/org/openstreetmap/osmosis/apidb/v0_6/impl/EntityHistory.java. when searching visible in osmosis. But not sure if the correct spot, as it has a very, very long history. Not so sure if osmosis will ignore it itself when directly reading pbf.

adrianulbona commented 4 years ago

I guess it needs to be tested. Let me know if you have time to dig into this.