g0vhk-io / HKAddressParser

香港地址解析器
https://bit.ly/hkaddressparser
116 stars 29 forks source link

Feature Request: Ability to parse PO Box #49

Open lifehome opened 5 years ago

lifehome commented 5 years ago

Problem / Current situation

Entering Fanling PO Box 69 will result in a bunch of irrelevant data, also if using PO Box 69, Fanling Post Office, the script cannot tell it's "Fanling Post Office".

Expected result

Entering Fanling PO Box 69 or PO Box 69, Fanling Post Office will return one result, and that's the address of "Fanling Post Office".

Why this should be added as a feature?

Since there are very large amount of variants in writing a PO Box address, I think this would be helpful in telling which Post Office the Box located at.

cswbrian commented 5 years ago

Thank you very much for your suggestion.

Hi, the parser relies on the Address Lookup Service API from the OGCIO of HKSAR as our major data source. The quality of our parsed result depends on this API.

Currently, the Fanling Post Office is not included in the API, it accounts for the wrong result for the parser. If you try PO Box 69, Tung Chung Post Office instead, the parser will return correct result, as the OGCIO API includes Tung Chung Post Office, which fits your expected result.

In the future, we may include other data source to increase the accuracy of the parser, and we also hope OGCIO will update its address database as it's really outdated. Feel free to suggest more features, let see how we can achieve it.

UnKnoWn-Consortium commented 5 years ago

The post office in Fanling is not officially called "Fanling Post Office" but instead "Wah Ming Post Office". It does give the correct result if you query with "Wah Ming Post Office". So it also includes a common/official name divergence issue.

lifehome commented 5 years ago

@UnKnoWn-Consortium Thank you for pointing out that. Sadly I have to disagree, since the "Fanling Post Office" (Branch code FNG) locates on the Ground floor of the North District Government Offices Building, which is a building intercross San Wan Road and Pik Fung Road in Fanling.

The post office you pointed out, "Wah Ming Post Office" is actually the smallest post office in the district, locates inside the Wah Ming Shopping Centre in Southern Fanling.

Perhaps we could somehow gather the branch list from the Hong Kong Post website, then feed it as the parser's new data source? Tho I think it would need a routine(i.e. monthly/annual) crawling process to keep the list up-to-date, just in case a new post office opened.

Apart from the issue itself, I am seeing a new angle of the need on parsing post office addresses. That is some parcel forwarding service(集運) allows to input the name of a post office, enable customers to pickup their parcel and pay shipping fees at the office counter. However, some of the company cannot identify those branch names correctly, resulting in both human error like delivery failure, and communication mistakes like branch with similar/symbolic names.

UnKnoWn-Consortium commented 5 years ago

@lifehome Sorry, my mistake, the search somehow gave me the Wah Ming Post Office result... After further digging, it appears the main culprit is the Lands Department data source. It does not handle the keyword PO Box <NUMBER> or P.O. Box <NUMBER> well and the current parser logic seems to have exacerbated the situation.

As for a list of post offices, the Lands Department actually maintains a list of geo-referenced public facility data that includes post offices at https://data.gov.hk/en-data/dataset/hk-landsd-openmap-geo-referenced-public-facility-data. And I believe it does constitute a part of their Location Search API. So maybe what is needed is a refinement to the parser logic.