Closed MD5AI closed 6 years ago
These are some interesting formats @DaniyarDS! Unfortunately, usaddress is limited in scope to addresses in the United States. If you've got a lot of international addresses to parse you might have more success with pypostal.
Hello. I tested usaddress locally. I found some interesting test cases. For example: This is my generated test case: Front Street North 695 Gilbert Z1 35904
Output: ('Front', 'StreetName'), ('Street', 'StreetNamePostType'), ('North', 'StreetNamePostDirectional'), ('695', 'OccupancyIdentifier'), ('Gilbert', 'PlaceName'), ('Z1', 'StateName'), ('35904', 'ZipCode')
Expected output: ('Front', 'StreetName'), ('Street', 'StreetNamePostType'), ('North', 'StreetNamePostDirectional'), ('695', 'OccupancyIdentifier'), ('Gilbert', 'PlaceName'), ('Z1', ' anything but not State '), ('35904', ' zipcode or anything else')
Example with real world data: Italian address, exactly Sardinia CMR 467 Box 7000 APO, AE 09096
Output: ('CMR', 'USPSBoxType'), ('467', 'USPSBoxID'), ('Box', 'USPSBoxType'), ('7000', 'USPSBoxID'), ('APO,', 'PlaceName'), ('AE', 'StateName'), ('09096', 'ZipCode')
May be by format this address is relevant, but this is not US addresses.
To solve that problem I generated many test cases where it would fail, Should I after-train the model ?