datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Mistook saint for street #335

Open stevetb777 opened 2 years ago

stevetb777 commented 2 years ago

2400 ST ANDREW ENNIS TX 75119 Parser decided the street name is "2400". and the place is "ANDREW ENNIS" Upon looking this is supposed to be 2400 ST ANDREW DR, ENNIS TX 75119 I think the logic could be modified to identify cases like this one. Thanks!

hc-goat commented 1 year ago

Ran into a similar issue, there are a number of small towns in FL and TX with variations of "Port Saint XXX" and usaddress doesn't cope well with them - I have extensive sample/training addresses I can provide, I see in this commit from a while back there appears to be some training data, happy to contribute if someone tells me what you need?

Having worked with spatial data and lived near there, Port Saint Lucie is always a problem, as it appears as Port St Lucie, Pt St Lucie, Pt Saint Lucie, Port St. Lucie, Port Saint Lucy (wrong) etc etc - Port Saint Joe is another.