bcgov / ols-geocoder

Physical Address Geocoder
Apache License 2.0
10 stars 6 forks source link

Verify correctness of address ranges added from ITN and StatsCan RNF #151

Open mraross opened 3 years ago

mraross commented 3 years ago

We suspect there are ridiculously large range limits that are causing creation of many bogus block face ranges. For example, we may have a road with a maximum, observation-derived block range of 3300-3398. ITN or RNF may have a bogus block face range of 4,000-60,000 on the same road which causes the filling of a yawning address range gap with hundreds of bogus ranges.

mraross commented 3 years ago

-----Original Message----- From: Graeme Leeming gleeming@refractions.net Sent: November 17, 2020 8:54 AM To: Ross, Michael RA CITZ:EX Michael.RA.Ross@gov.bc.ca Subject: RNF analysis

Hi Michael,

Darrin did a quick analysis on RNF ranges last night before you posted ticket 151. See attached file with the 9 records which have at least one block face spanning over 50k address range values. Those numbers are in the L_DIFF and R_DIFF fields, with the worst offender being 31401 to 999999 for a coverage of almost a million addresses.

-Graeme

mraross commented 3 years ago
NGD_UID,C,10 NAME,C,50 TYPE,C,6 DIR,C,2 AFL_VAL,C,9 ATL_VAL,C,9 LEFT,N,16,0 L2,N,16,0 L_DIFF,N,16,0 AFR_VAL,C,9 ATR_VAL,C,9 RIGHT,N,16,0 R2,N,16,0 R_DIFF,N,16,0 CSDUID_L,C,7 CSDNAME_L,C,55 CSDTYPE_L,C,3 CSDUID_R,C,7 CSDNAME_R,C,55 CSDTYPE_R,C,3 PRUID_L,C,2 PRNAME_L,C,55 PRUID_R,C,2 PRNAME_R,C,55 CLASS,C,2
2695357 Larson RD 7540 7990 7540 7990 450 7735 99999 7735 99999 92264 5953042 Fraser-Fort George C RDA 5953042 Fraser-Fort George C RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
760491 Old Yale RD 26801 79099 26801 79099 52298 26800 26898 26800 26898 98 5915001 Langley DM 5915001 Langley DM 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
2438159 Sunnybrae-Canoe Point RD 7431 7431 0 0 7430 66432 7430 66432 59002 5939037 Columbia-Shuswap C RDA 5939037 Columbia-Shuswap C RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 24
3143350 Somerset DR S 19 19 0 0 18 58198 18 58198 58180 5951019 Bulkley-Nechako F RDA 5951019 Bulkley-Nechako F RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
4440348 York DR 227389 709389 227389 709389 482000 202710 630790 202710 630790 428080 5951019 Bulkley-Nechako F RDA 5951019 Bulkley-Nechako F RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
4006665 Yalamote CRES 15 52211 15 52211 52196 52210 0 52210 0 5909837 Cheam 1 IRI 5909837 Cheam 1 IRI 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
5056380 West Lake RD 0 0 0 999999 31401 999999 31401 968598 5953042 Fraser-Fort George C RDA 5953042 Fraser-Fort George C RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
4440343 York DR 709391 999999 709391 999999 290608 630792 888888 630792 888888 258096 5951019 Bulkley-Nechako F RDA 5951019 Bulkley-Nechako F RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
2695344 Blume RD 7810 78100 7810 78100 70290 7647 78101 7647 78101 70454 5953044 Fraser-Fort George D RDA 5953044 Fraser-Fort George D RDA 59 British Columbia / Colombie-Britannique 59 British Columbia / Colombie-Britannique 23
mraross commented 3 years ago

RNF address ranges have too many issues to continue using. Let's drop it for now and focus on ITN ranges.

gleeming commented 3 years ago

There are only 18 segments in ITN with range spans on at least one block face of over 8000. I reviewed several of them. Those that were not on IRs or seemingly legitimate have been ticketed for GeoBC to review and possibly update.

mraross commented 3 years ago

Great stuff. Let's keep including ITN as it is then.

gleeming commented 3 years ago

After excluding RNF and batching the full Health Ideas dataset, the following main differences have been observed vs with RNF. 46k fewer block matches, 18k more civic number matches, reduction of 0.2% of cases with a score 90+.

The 18k civic number matches were likely recovered based on BAARG logic no longer rejecting/excessively shifting site cases that had inconsistencies in RNF when mixed with other sources. They may account for a chunk of the lost block matches. However in the best case, that still would leave a net loss of about 28k block matches when we exclude RNF. To put this into perspective, all data and code improvements for the bronze release have already increased scores of 90+ by over 12% and this loss is roughly 0.2%.

I'm reassigning this to DataBC as if further investigation is required then this could be time consuming. One area to look into would be the addresses formerly at a block precision that are now locality or street. Are they real addresses, could they be recovered if RNF were only used on block faces with no other ranges?

mraross commented 3 years ago

Agreed. We should investigate just adding RNF ranges on blocks with no ranges but that's a task for a future release.