TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

BUS_WITH_I datacheck #252

Closed yakra closed 4 years ago

yakra commented 4 years ago

https://github.com/TravelMapping/DataProcessing/blob/5c69952abdbfe8e7a0fa1bae8789868b4a5a433b/siteupdate/python-teresco/siteupdate.py#L3349-L3351 requires a full match, which means some labels (e.g. with directional suffixes, or abbrevs) could slip thru the cracks. I'll cook up a few test cases and test them out.

C++ version is not affected.

yakra commented 4 years ago

Overlooks route "numbers" such as 35W, 35E, 69W, 69C, 69E

yakra commented 4 years ago

Case sensitive = potential for false negatives. How about all regexes in siteupdate.py? Their C++ equivalents?

yakra commented 4 years ago

Only flag these when country is USA.

Potential for FPs: usai;TX;I-69;;Bus;Buster Keaton, TX;tx.i069bus; ;) Solution: check for intersecting route; abbrev

yakra commented 4 years ago

requires a full match, which means some labels (e.g. with directional suffixes, or abbrevs) could slip thru the cracks.

Just slap a .* at the end and we're good to go.

task Py C++
require >= 1 numeral Done Done
account for [NEWS] suffixes Done Done
case insensitive Bus Done Done
don't require full match Done Done
yakra commented 4 years ago

Only flag these when country is USA.

Potential for FPs: usai;TX;I-69;;Bus;Buster Keaton, TX;tx.i069bus; ;) Solution: check for intersecting route; abbrev

Nope. Because the C++ version performs this datacheck while reading .wpt files from disk, there's potential for *colocated to not be populated yet. It's not worth making another pass thru the data to do this. On the rare theoretical occasion this would occur, it can just be marked FP.

yakra commented 4 years ago

Does not account for starred labels. I'll fix wptedit first, and get this afterwards.