atorger / nvdb2osm

The Unlicense
9 stars 2 forks source link

Stockholm.zip just stops processing #12

Closed matthiasfeist closed 3 years ago

matthiasfeist commented 3 years ago

Hi!

I tried to process the quite large file for Stockholm and the process keeps on quitting without any error message. I really don't know enough about python to properly debug this so I hope you can help out. Here is the log output of the script run with --debug: stockholm.log Maybe that gives you some pointers. I tried to run it on my Mac and on a EC2 Amazon Linux instance. Same result. It sometimes comes a bit further but always exists seemingly in the middle of a process.

matthiasfeist commented 3 years ago

Alright, it seems to be memory related...


Mar  2 08:12:44 ip-172-31-2-150 kernel: Killed process 2682 (python) total-vm:1458816kB, anon-rss:831156kB, file-rss:0kB, shmem-rss:0kB
Mar  2 08:12:44 ip-172-31-2-150 kernel: oom_reaper: reaped process 2682 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB```
atorger commented 3 years ago

Yeah it probably uses quite a lot of memory. I haven't really measured or thought about optimizing it, I have 64 gigs of RAM on my box and Stockholm ran fine last time I tried, but Stockholm is indeed one of the larger areas...

matthiasfeist commented 3 years ago

:) yeah it's a lot of data. I'm currently working on a project where I want to build on top of your amazing work. I already have a pipeline which can download the whole NVDB catalog once per mont, runs your script on all files and then pushes the resulting .osm files to a website so that mappers can use it without having to fiddle around with python.

The next step in the project is then that I want to try to perform a sort of "visual diff" between the generated .osm files and the current state in open street map. My end goal would be a map that shows where open street map has missing geometry or wrongly tagged highways so that mappers can focus their efforts in the right regions.

Any thoughts on that?

atorger commented 3 years ago

Great project! I think a visual diff might be "easy" to do, if one just focuses on the geometry. If one would simply make a wireframe (one pixel wide) rendering of highways from NVDB in red and layer a wireframe rendering of the OSM in green on top (no transparency) and keep all roads visible even when zooming out quite far the human eye should quite easily be able to spot differences.

If OSM has a geometry that is a bit offset there will be a green line just beside a red line, but as one zooms out they will overlap and then one will only see a green line. One could make the OSM wireframe 2-3 pixels wide to have it cover NVDB layer more even if a little bit offset.

I think key to make a visual diff work is to use wireframe with small roads visible even when you zoom out much, so you can view large areas at once, plus use minimal amount of colors, two will do I think (plus a low contrast background with lakes and borders for easier orientation), so you really only need to spot "red areas" to see where stuff is empty.

Then the more advanced diff would be to check for tag mismatches, speed limits, street names, etc. Then one would need to make some analysis of the files rather than just render them out, that's a much bigger and more complicated thing to do, but to start with just showing the geometry difference would be very valuable.

I'm in the process of updating Lycksele kommun, and as there were quite many roads missing there, and the larger roads while there had often bad geometry. I actually used the map skoterleder.org for scouting as it shows the small roads at lower zoom levels than osm.org does.

However as I personally work kommun for kommun, ie go through all roads regardless if they are there or not, I don't have a big need for a visual diff now, but I also work in northen Sweden where maps are less dense and the data which is in is often quite immature. The denser and more mature the map is, the more need there will be for a diff tool. The kommuner I have already synched with NVDB I need visual diff for later when to check against updated NVDB data. And of course in areas where the map is so dense and frequently updated so it's not really feasible to work through a whole kommun like I do a visual diff would be great to find those areas missing.

Note that cities will be difficult, as with lots of roads with multiple lanes, sidewalks, cycleways and crossings there are many ways to map the same things, and the style currently in OSM may not match that well with what NVDB has. For example, NVDB always(?) make cycleways separate, which is not always the case in OSM, and how you draw cycleways around roundabouts and crossings can differ quite much. Likewise some multi-lane roads can sometimes be two ways in OSM and just one in NVDB or the other way around. For a visual diff that will be managable, but when one want in software to analyze to see if the OSM map and the NVDB map represents the same thing, although in different styles, that will be hard/impossible. I don't know how big this problem will be though, one have to experiment...

matthiasfeist commented 3 years ago

Great input thanks a lot! I'll first get my osm files generation with your script online and ping you then :)

After than I'll start thinking about the diff-ing and your comments above 👍