The most important part of this PR is the smallest—a pair of lines in import_osm.sh that clears any backslash characters from the converted.osm file before we feed it to osm2pgrouting to import into the database. It turns out osm2pgroutingreally doesn't like backslash characters. The error looks like this:
That certainly looks like an error, but I had assumed it would only affect the actual segments with bad names. But it turns out where it says "Vertices inserted: 0 Split ways inserted 0" after the errors, that's an indication that it lost a lot more than the affected segments. Apparently it processes the input in chunks, and whatever other segments happen to be in the same chunk as the bad segments just get dropped on the floor.
I don't believe there's any functional purpose for which we would want backslashes in our OSM input. They're not used for encoding (the file is in UTF-8) and we don't actually use segment names for anything in the analysis anyway. So I just did a simple sed 's/\\/backslash/' to make them go away.
Other items:
The next most important thing this does is increase the memory available to the analysis task from 32GB to 61GB. It only matters for Los Angeles, I think, but it does seem to matter for Los Angeles. And we're using i3.2xlarge instances anyway, so we might as well allocate all the memory.
I also adjusted some of the parameters used to set PostgreSQL memory and disk usage limits. We have lots of memory and lots of disk, so it makes sense to be generous, to get all the speed we can and avoid any failures in the largest jobs.
Speaking of those parameters, I made the staging ones match production. They had been set lower, presumably to make repeated testing cheaper, but we only very occasionally test on staging these days, so it makes sense to make it as similar to production as possible.
There were a few errors in the build when I tried to spin my development instance back up. I tried to find the path of least resistance for getting around them, which meant upgrading the base image in the case of the tilegarden container but just adjusting the Debian package list to be able to keep using the old image for the angularjs container.
Demo
I found the location of one of the segments named "\" and made a tiny boundary centered on it. Here's are the before-and-after shots showing the affect of losing chunks from the OSM import:
Before:
After:
Notes
The GitHub Actions deployment to staging isn't working (see #937), but I pushed this branch to staging manually.
Testing Instructions
I don't think it seems necessary to spin up a Batch worker on staging and do more runs there, but it probably makes sense to test this locally to the point of running a small analysis.
Spin up a dev instance per the README instructions
Run an analysis for it (click "Run Analysis" on the Jobs page then copy-paste the command that gets printed into the Docker log into a shell within the VM).
Confirm that it runs successfully and you can see the results by clicking on it in the All places list once it has finished.
Overview
The most important part of this PR is the smallest—a pair of lines in
import_osm.sh
that clears any backslash characters from theconverted.osm
file before we feed it toosm2pgrouting
to import into the database. It turns outosm2pgrouting
really doesn't like backslash characters. The error looks like this:That certainly looks like an error, but I had assumed it would only affect the actual segments with bad names. But it turns out where it says "Vertices inserted: 0 Split ways inserted 0" after the errors, that's an indication that it lost a lot more than the affected segments. Apparently it processes the input in chunks, and whatever other segments happen to be in the same chunk as the bad segments just get dropped on the floor.
I don't believe there's any functional purpose for which we would want backslashes in our OSM input. They're not used for encoding (the file is in UTF-8) and we don't actually use segment names for anything in the analysis anyway. So I just did a simple
sed 's/\\/backslash/'
to make them go away.Other items:
Demo
I found the location of one of the segments named
"\"
and made a tiny boundary centered on it. Here's are the before-and-after shots showing the affect of losing chunks from the OSM import: Before:After:
Notes
The GitHub Actions deployment to staging isn't working (see #937), but I pushed this branch to staging manually.
Testing Instructions
I don't think it seems necessary to spin up a Batch worker on staging and do more runs there, but it probably makes sense to test this locally to the point of running a small analysis.
Checklist
Resolves #946