datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Add training data for Canadian Addresses #254

Open IlyaSukhanov opened 5 years ago

IlyaSukhanov commented 5 years ago

us-address mostly works out of the box with (English-language) Canadian addresses however it often misparses postal codes. This training data adds dozen or so Canadian addresses chosen by randomly looking them up on OSM Map. There is one address per province to ensure Provinces are also properly recognized.

The training and test data resides in canada.xml of the respective folders.

It is worth noting that while us-address uses ZipCode and StateName, in Canadian parlance these are Postal-Code and Provinces.

IlyaSukhanov commented 5 years ago

This change also addresses https://github.com/datamade/usaddress/issues/115