bcgov / ols-geocoder

Physical Address Geocoder
Apache License 2.0
10 stars 6 forks source link

Addresses with STN receive LOCALITY MISSING fault #170

Open bstratto opened 3 years ago

bstratto commented 3 years ago

In Geocoder 4.1, addresses with the abbreviation STN (sometimes used to mean postal station) appear to receive a penalty of LOCALITY.missing even though the locality is present.

Example: Address: 33224 FARRANT CRES GD STN, ABBOTSFORD, BC 4.1: 33224 Farrant Cres, Abbotsford, BC Score: 89, Pecision: 100 Faults: "/PJ" POSTAL_ADDRESS_ELEMENT.notAllowed 1 "" LOCALITY.missing 10

Address: 2083 CLEARBROOK RD STN, ABBOTSFORD, BC 4.1: 2083 Clearbrook Rd, Abbotsford, BC Score: 89 Precision: 99 Faults: "/PJ" POSTAL_ADDRESS_ELEMENT.notAllowed 1 "" LOCALITY.missing 10

Address: 3191 GOLDFINCH ST STN, ABBOTSFORD, BC 4.1: 3191 Goldfinch St, Abbotsford, BC Score: 89 Precision: 100 Faults: "/PJ" POSTAL_ADDRESS_ELEMENT.notAllowed 1 "" LOCALITY.missing 10

cmhodgson commented 3 years ago

According to Canada Post, a STN element requires a station name so the geocoder eats the following word as the name of the station. Because this is handled by a regular expression before the lexing and parsing, there is no way to know that the word being eaten is a locality. I suggest we identify how often there actually is a station name following STN, if we can figure out what that typically looks like (perhaps just a single letter as in "STN A") or if it doesn't seem to happen, then we can change the regex to not eat the next word. It might be better to handle some of the postal garbage in the parser as a specific kind of garbage that can happen anywhere, but that would have to be for a future plan.

mraross commented 3 years ago

Postal station name is often a single letter such as A but may be a word such as MAIN. Here are some examples:

PO BOX 9404 STN PROV GOVT Victoria BC PO BOX 1000 STN MAIN, Comox BC PO BOX 48810, STN BENTALL, Vancouver BC PO BOX 2083 STN TERMINAL Vancouver BC Po Box 17000 STN FORCES, Victoria, BC

mraross commented 3 years ago

Now that we have garbage pickup, maybe we don't need to identify postal elements any more. Maybe postal code and c/o would be the only exceptions.

mraross commented 3 years ago

Will be fixed by issue #174