azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 11 forks source link

Strip backslash from OSM names (and fix provisioning and change analysis parameters) #947

Closed KlaasH closed 1 year ago

KlaasH commented 1 year ago

Overview

The most important part of this PR is the smallest—a pair of lines in import_osm.sh that clears any backslash characters from the converted.osm file before we feed it to osm2pgrouting to import into the database. It turns out osm2pgrouting really doesn't like backslash characters. The error looks like this: image

That certainly looks like an error, but I had assumed it would only affect the actual segments with bad names. But it turns out where it says "Vertices inserted: 0 Split ways inserted 0" after the errors, that's an indication that it lost a lot more than the affected segments. Apparently it processes the input in chunks, and whatever other segments happen to be in the same chunk as the bad segments just get dropped on the floor.

I don't believe there's any functional purpose for which we would want backslashes in our OSM input. They're not used for encoding (the file is in UTF-8) and we don't actually use segment names for anything in the analysis anyway. So I just did a simple sed 's/\\/backslash/' to make them go away.

Other items:

Demo

I found the location of one of the segments named "\" and made a tiny boundary centered on it. Here's are the before-and-after shots showing the affect of losing chunks from the OSM import: Before: image

After: image

Notes

The GitHub Actions deployment to staging isn't working (see #937), but I pushed this branch to staging manually.

Testing Instructions

I don't think it seems necessary to spin up a Batch worker on staging and do more runs there, but it probably makes sense to test this locally to the point of running a small analysis.

Checklist

Resolves #946