bcgov / ols-geocoder

Physical Address Geocoder
Apache License 2.0
10 stars 6 forks source link

Improve recognition of PO BOX addresses #256

Open mraross opened 3 years ago

mraross commented 3 years ago

PO BOX addresses often have additional noise that prevents the geocoder from finding a good locality-level match. Analysis of rejected PO BOX addresses reveals the following pattern:

[initialGarbage] PO BOX aNumber [localityGarbage] aLocality provinceCode

Here are some examples:

PO BOX 18603 RPO LADNER HALFMOON BAY BC PROFESSIONAL SERVICES CORPORATION PO BOX 480 ATLIN BC RPO DUNBAR PO BOX 45033 VANCOUVER BC PO BOX 4126 RPO SUMAS WAY ABBOTSFORD BC PO BOX 72052 RPO OLD ORCHARD RD BURNABY BC

mraross commented 3 years ago

Note from Brian 2021/06/08 Michael,

I took a quick look and found some interesting cases using Kaegan’s data. Several examples of glued words, special characters, leading or trailing garbage and postal elements. These examples remind me of the conversation we had the other day about proposing that when the Geocoder would otherwise provide a score of 1 (BC) to try and accept the locality instead if provided.

You may notice that the structured files perform better, but that is because there is no parsed address fields other than locality and province.

Example Score Corrected example Correcte example score
6/3190 TAHSIS AVE, COQUITLAM, BC 1 6-3190 TAHSIS AVE, COQUITLAM, BC 100
2617 WILLOWGROUSE CRES, NANAIMO, BC 1 2617 WILLOW GROUSE CRES, NANAIMO, BC 100
31465 WAMSILI ROAD, ABBOTSFORD, BC 1 31465 Walmsley Ave, Abbotsford, BC 100
484D`BLACK BEAR RIDGE, NANAIMO, BC 1 484D BLACK BEAR RIDGE, NANAIMO, BC 76
39555 LOGGERSLANE,BOX1966, SQUAMISH, BC 1 39555 LOGGERS LANE,BOX1966, SQUAMISH, BC 99
16KM OLD HEDLEY RD PO BOX 1943, PRINCETON, BC 1 OLD HEDLEY RD PO BOX 1943, PRINCETON, BC 77
RRMC 2SQN FMO VICTORIA, VICTORIA, BC 1 VICTORIA, BC 68
R.R.#1, CARLSBAD SPRINGS, BC 1 BC BC
PO BOX 21009, SPO, PRINCE GEORGE, BC 1 PO BOX 21009 PRINCE GEORGE, BC 67
3854 MCI, ARMSTRONG, BC 1 ARMSTRONG, BC 67
GENERAL DELIVERY 1471 WHI, CEDAR, BC 1 CEDAR, BC 67
PO BOX 19675 RPO CTRE/POINT/MALL, VANCOUVER, BC 1 PO BOX 19675 VANCOUVER, BC 67
BOX 100 (CEG), SALMON ARM, BC 1 BOX 100 SALMON ARM, BC 67
CAPERNWRAY HARBOUR BIBLE CENTR, THETIS ISLAND, BC 1 THETIS ISLAND,BC 68
mraross commented 3 years ago

The previous examples suggest another pattern: initialGarbage aLocalityName, aProvinceCode

BK01 commented 1 year ago

As a next step, add functionality to recognize and remove ‘SPO’ and ‘RPO’ from an address. This will reduce the amount of locality garbage and improve results.

Current handling of postal codes includes the recognition and removal of ‘PO Box + number’ from the address. Work on a right-to-left salvage mode will be covered in a separate issue.

BK01 commented 9 months ago

Validated in geocodertst (Geocoder 4.3). The Geocoder now recognizes and handles SPO and RPO. Applicable examples that were previously returning a score of 1 and matchPrecision level of Province are now returning at the expected matchPrecision (locality, street etc)