OpenAddressesUK / roadmap

Open Addresses UK's roadmap. Learn more about Open Addresses at http://openaddressesuk.org/
2 stars 0 forks source link

Welsh language sorting office - research #107

Open peterkwells opened 9 years ago

peterkwells commented 9 years ago

We have built a free-format text parsing API, Sorting Office (https://sorting-office.openaddressesuk.org) that takes free-format addresses and turns them into well-structured addresses. It can do this because of our knowledge of how UK addresses are structured and of our knowledge of the building blocks (towns, postcodes, etc) that form UK addresses.

We know that the platform needs to support Welsh language addresses. The data model can support Welsh language addresses already.

But when we extend the thinking to services, like Sorting Office, we need to understand if or how Welsh language addresses differ.

What would be the algorithm for turning a free-format Welsh address into a structured address?

peterkwells commented 9 years ago

Owen Blacker (@owenblacker) on the Twitter has stated that there is a 1-1 match: https://twitter.com/peterkwells/status/583544915524771840

Investigating in OS Open Names shows examples of the 1-1 match (this sample is from SS68.csv):

screen shot 2015-04-02 at 12 12 03

The gaps are interesting. When we implement the capability to learn new building blocks then we could help to fill in this missing information through people using the services.

Waiting to hear from others to confirm if the structure is the same and hence if 1-1 matching always works. e.g. we need to be careful of nuances such as building block 1 (road) is English but building block two (town) is Welsh.

peterkwells commented 9 years ago

Update on mix/match of languages in building blocks for a single address: https://twitter.com/peterkwells/status/583591684975554561

TLDR: that's an even edgier edge case. Not something to support immediately, if ever.