Closed vamsiemani closed 7 years ago
ORIGINAL STRING: 5875 Castle Creek Parkway North Dr Ste 285 ORIGINAL STRING: 3565 Piedmont Rd Driveway A Bldg 3 Ste 415
These look like they are parsing correctly to me. The only one that may be wrong is this one:
ORIGINAL STRING: 300 Frank W. Burr Boulevard Teaneck, New Jersey 07666
Where the W. should be part of the street name as it is a Human Named street.
This one also looks wrong:
ORIGINAL STRING: 1329 N Illinois Route 3, Ste 3, Waterloo, IL 62298
Route 3 shouldn't be Occupancy#.
Happy birthday to this issue! I agree with @REWDevinMcBeth – in the first two cases, parse
succeeds where tag
fails. This is because tag
automatically tries to concatenate tokens that correspond to the same grouping, but those two addresses are complex enough to have two sets of one grouping (e.g. Driveway
and Bldg
are both SubaddressTypes, but they correspond to distinct subaddresses that cannot be concatenated). In cases like these, you should either use parse
or build in some logic to your code to handle the exception (see the docs for more). Hope that makes sense!
Adding the failed addresses to our training data and closing this.
ORIGINAL STRING: 5875 Castle Creek Parkway North Dr Ste 285 PARSED TOKENS: [(u'5875', 'AddressNumber'), (u'Castle', 'StreetName'), (u'Creek', 'StreetName'), (u'Parkway', 'StreetNamePostType'), (u'North', 'StreetNamePostDirectional'), (u'Dr', 'StreetNamePostType'), (u'Ste', 'OccupancyType'), (u'285', 'OccupancyIdentifier')] UNCERTAIN LABEL: StreetNamePostType
http://whitefinder.com/indianapolis-in/maps-financial-services-llc-3175775180.html
Another example with Occupancy Identifier:
ORIGINAL STRING: 1329 N Illinois Route 3, Ste 3, Waterloo, IL 62298 PARSED TOKENS: [(u'1329', 'AddressNumber'), (u'N', 'StreetNamePreDirectional'), (u'Illinois', 'StreetName'), (u'Route', 'StreetNamePostType'), (u'3,', 'OccupancyIdentifier'), (u'Ste', 'OccupancyType'), (u'3,', 'OccupancyIdentifier'), (u'Waterloo,', 'PlaceName'), (u'IL', 'StateName'), (u'62298', 'ZipCode')] UNCERTAIN LABEL: OccupancyIdentifier
https://local.yahoo.com/info-85444392-sidebarr-technologies-waterloo?csz=Frohna%2C+MO&stx=Computer+Repair
Confused with Person Name vs Directional keyword:
ORIGINAL STRING: 300 Frank W. Burr Boulevard Teaneck, New Jersey 07666 PARSED TOKENS: [(u'300', 'AddressNumber'), (u'Frank', 'StreetName'), (u'W.', 'StreetNamePostDirectional'), (u'Burr', 'StreetName'), (u'Boulevard', 'StreetNamePostType'), (u'Teaneck,', 'PlaceName'), (u'New', 'StateName'), (u'Jersey', 'StateName'), (u'07666', 'ZipCode')] UNCERTAIN LABEL: StreetName http://www.vision-institute.com/new-jersey/patient-information/directions.htm
Confused with multiple SubAddresstypes:
ORIGINAL STRING: 3565 Piedmont Rd Driveway A Bldg 3 Ste 415 PARSED TOKENS: [(u'3565', 'AddressNumber'), (u'Piedmont', 'StreetName'), (u'Rd', 'StreetNamePostType'), (u'Driveway', 'SubaddressType'), (u'A', 'SubaddressIdentifier'), (u'Bldg', 'SubaddressType'), (u'3', 'SubaddressIdentifier'), (u'Ste', 'OccupancyType'), (u'415', 'OccupancyIdentifier')] UNCERTAIN LABEL: SubaddressType