bcgov / api-specs

[OpenAPI Specification](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.1.md) Repository
https://catalogue.data.gov.bc.ca/group/bc-government-api-registry
Other
26 stars 19 forks source link

In geocoder/addresses, improve the way partial names are handled #176

Closed mraross closed 6 years ago

mraross commented 7 years ago

There are two problems with geocoder/addresses in its interaction with the name autocompletion tools: single words aren't autocompleted, and single-word street name aliases are different than their official streetName which leads to confusion.

For example, in the location services demo address tab, if you enter Black, you get:

Black Bridge, Kennedy Lake, BC
Black Creek Bridge, Black Creek, BC
Black Lane, Tofino, BC
Black Rd, Black Creek, BC

instead of

Black Bear TR 17 Bridge, BC
Black Creek Bridge, Black Creek, BC
Black Bear Lane, Tofino, BC
Black Creek Rd, Black Creek, BC

It was agreed that handling partial name matches in the parser is better than the current method of generating partial street and locality name aliases. In the past, handling partial name matches in the parser was ruled out because all the street centroids would have to be loaded which would be more trouble than it was worth. Handling partial name matches in the parser makes address prep simpler because pseudo-aliases and centroids for pseudo-street aliases don't have to be generated.

mraross commented 7 years ago

Here's what happens when you enter Black Rd:

https://delivery.apps.gov.bc.ca/pub/geocoder/addresses.kml?addressString=black%20rd&maxResults=5

The first result will be Black Rd, Black Creek, BC but the road is actually called Black Creek Rd and all sites on this road have the correct streetName.

The second result will be Black Rd, Fort Fraser, BC but the road is actually called Black Bear Loop Rd. Again, all sites on the road have the correct streetName.

The third result will be Black Rd, Powell River, BC but the road is actually called Black Point Rd and all sites on this road have the correct streetName

The fourth result will be Black Rd, Sechelt, BC but the road is actually called Schetxwen Rd and all sites on this road have the correct streetName

The fifth result will be Black Rd, Squamish, BC but the road is actually called Black Bear Rd and all sites on this road have the correct streetName

Also try this https://delivery.apps.gov.bc.ca/pub/geocoder/addresses.kml?addressString=red&maxResults=5

which returns Red, Indian Arm, BC

salmon returns salmon bridge, falklands but there is no salmon bridge, falklands, just a holmes bridge falklands that the salmon bridge match is located at

mraross commented 7 years ago

Here is how the partial word/name dictionary should work (results not exhaustive):

bla => Black Creek, Black Bear, Black Bear Loop, Black Point sea => Sea, Sea Island 3, Seascape ced => Cedar, Cedar Hill, Cedar Hill Cross bro => Broad, Broadway hill => Cedar Hill, Cedar Hill Cross, Hillside, Hillcrest cedar hi => cedar hill cross hill c => cedar hill cross

mraross commented 7 years ago

You could add an autoComplete flag parameter to inform the parser that addressString needs autocompletion. The parser can then take extra steps it wouldn't otherwise perform in full address matching.

cmhodgson commented 7 years ago

If autocomplete == true: bla => Black Creek, Black Bear, Black Bear Loop, Black Point sea => Sea, Sea Island 3, Seascape ced => Cedar, Cedar Hill, Cedar Hill Cross bro => Broad, Broadway hill => Hillside, Hillcrest cedar hi => cedar hill cross

If autocomplete == false: bla => black => Black Creek, Black Bear, Black Bear Loop, Black Point sea => Sea, Sea Island 3 ced => cedar => Cedar, Cedar Hill, Cedar Hill Cross bro => hill => cedar hi =>

Method: if autocomplete && it is the last word, allow any prefix match (possibly subject to some kind of limit (maxResults?)) to prevent excessive matches. If not in autocomplete, for any word, allow word-based prefix matching (the suffix starts with a space character).

mraross commented 7 years ago

Testing on dev platform at Refractions:

Odd

Black => Black Creek, BC twice in the top five choices

Good

Bla => Black Creek, BC in the top five choices Blac => Black Creek, BC in the top five choices Cedar Hi => Cedar Hill Cross Rd, Saanich, BC

Great

One Hun => 100 Mile House, BC Hun => 100 Mile House, BC

cmhodgson commented 7 years ago

To answer for the oddity, there are 2 black creeks in BC... it is showing you both... but you have no way to differentiate them so I agree that it is less than ideal from a UI standpoint... There are about 70 localities with duplicate names, 4 of which are apparently in triplicate: Dog Creek, Pine Valley, Shuswap, Summit Lake. Try typing "dog" ...

cmhodgson commented 7 years ago

I just realized that I am not stripping out the extra aliases created in the pre-processing for the multi-word handling for localities (I am doing so for the street names though) So this might be making the problem even worse, I will fix this tomorrow (Tuesday).

cmhodgson commented 7 years ago

So stripping out the multi-word locality aliases is not really do-able without reprocessing using new code that doesn't create them in the first place. Upon closer inspection they should not be affecting the results in these cases anyway.

mraross commented 7 years ago

I don't think we need to change the API to handle multiple localities with the same name. We could change the demo app to always return and display the top N matches ( where N is the number of choices in the address pick list) with the chosen address first and selected in a list in the side panel and zoomed to in the map. The user can select the other localities from the list to see where they are.

cmhodgson commented 7 years ago

So we definitely need some kind of penalty for the multi-word expansion because if you input "beaver point rd" then "beaver point rd" needs to come back as a better match than "beaver point camp access rd".

cmhodgson commented 7 years ago

We added STREET_NAME.partialMatch and LOCALITY.partialMatch faults at 1 point which are applied when a multi-word expansion is done. This results in better matching behaviour.

mraross commented 7 years ago

confimed in delivery