ad-freiburg / pfaedle

Precise map-matching for public transit feeds. Generates high-quality GTFS shapes from OSM data.
GNU General Public License v3.0
208 stars 29 forks source link

WIP: Add osmium support for handling different kind of files #15

Open vesavlad opened 4 years ago

vesavlad commented 4 years ago

Still a work in progress and would kindly appreciate some review/feedback since some namings are not so clear. Closes: https://github.com/ad-freiburg/pfaedle/issues/10

patrickbr commented 4 years ago

Thank you very much for your work! I will look over the code in the next days. How did you change the general workflow in OsmBuilder? Have you run any tests regarding memory consumption and parsing times? Is XML parsing now faster or slower than before?

In general, I am still a bit hesitant to use libosmium here. It's a huge additional dependency. In particular, it introduces Boost as a dependency, which I would like to avoid. If the main goal is to support .pbf files, I still think it would be a better approach to just parse the .pbf files directly. But maybe I am wrong :)

vesavlad commented 4 years ago

Currently this is still a WIP so currently what is done only reading the data through the libosmium. How did you change the general workflow in OsmBuilder?

Have you run any tests regarding memory consumption and parsing times?

Is XML parsing now faster or slower than before?

The idea was to keep the application "logic" as you have written it since there is still time required for me to understand in detail what is done there.

Also please don't hesitate to:

Honestly might be good to:

This is a very practical application that ads a huge benefit for processing gtfs data for agencies that do no generate their shapes for GTFS. Thanks for developing it will try to contribute as much as I can.

vesavlad commented 4 years ago

One more note: the pull request also contains some clang code improvement suggestions.

derhuerst commented 3 years ago

@patrickbr I'm curious what's holding this PR back? Is it that you didn't have time/energy/motivation to review this yet, or is it the general direction (e.g. the Boost dependency) that you're unhappy with?

I'm currently map-matching many GTFS feeds using pfaedle (thanks for this tool btw!), and it has to re-read a 12gb OSM XML file for every GTFS feed. I hope that reading ~700mb of .pbf would be faster.

derhuerst commented 3 years ago

I also noticed that pfaedle seems to read this file multiple times, once per matching iteration. In my case, it reads & parses the 12gb de-bw-buffered.osm file three times. Within Docker for macOS on my old laptop, each read takes ~15min.

laem commented 8 months ago

I must admit that having to handle a > 15 Go bz2 file instead of a 4,5 Go pbf file makes this lib harder to try. Thanks for the work on this PR !

patrickbr commented 8 months ago

Thank you again for all your efforts here. I have been hesitant to merge this PR because it would add major dependencies (libosmium and boost). I am not happy with that. Also, it was opened before a major refactoring and rewrite of large parts of the tool in 2021. The more sophisticated OSM formats (o5m, protobuf) are not that hard to parse, and I would still prefer a simple solution which just reads these formats directly, without going through libosmium. The main benefit that libosmium adds besides format parsing is reference resolution and the construction of ready-to-use geometrical objects. The techniques to do that are already there in the pfaedle code, all that is missing is a drop-in replacement of the XML parser with an o5m or protobuf parser.

I have been working on that for a few months now.