datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

train model with new data #184

Closed keliu0530 closed 7 years ago

keliu0530 commented 7 years ago

I have a couple of addresses like "254 CR 2311 , AZ". In this example CR 2311 should be street name but CR is tagged as street name pretype. Thus I train the model with "254 CR 2311, AZ" "256 CR 2311, AZ" "251 CR 2311, AZ" "256 CR 2311, AZ" "257 CR 8570, AZ" "251 CR 5020, AZ" The new model has the same problem with "208 CR 2300, AZ" but works fine with "208 CR 5020, CA" What kind of training data do you think can solve this problem? Should I give more addresses?

fgregg commented 7 years ago

CR doesn't stand for county road?

On Mon, Jun 12, 2017 at 1:57 PM, keliu0530 notifications@github.com wrote:

I have a couple of addresses like "254 CR 2311 , AZ". In this example CR 2311 should be street name but CR is tagged as street name pretype. Thus I train the model with "254 CR 2311, AZ" "256 CR 2311, AZ" "251 CR 2311, AZ" "256 CR 2311, AZ" "257 CR 8570, AZ" "251 CR 5020, AZ" The new model has the same problem with "208 CR 2300, AZ" but works fine with "208 CR 5020, CA" What kind of training data do you think can solve this problem? Should I give more addresses?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datamade/usaddress/issues/184, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgxbeRNnrmSds_qDBCjZoJ620I4FOyqks5sDYn-gaJpZM4N3iyC .

-- 773.888.2718

keliu0530 commented 7 years ago

@fgregg It does. I think we have different tag standard. I'll fix this problem by joining pretype and street name. Thank you!

jeancochrane commented 7 years ago

Sounds like this is cleared up. Let us know if you have any more questions @keliu0530!