datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

Would you welcome a "higher level" PR RE USPS addressing? #226

Open tommyjcarpenter opened 6 years ago

tommyjcarpenter commented 6 years ago

I wrote a function that, given an address string (possibly messy):

  1. uses your lib to tag it
  2. tries to reassemble all the various tags into the address parts "number, street, city, state, zip". For example, an ordered, tedious concatenation of many tags forms what the USPS calls the "street".
  3. Runs a USPS address abbreviator on it, see: https://pe.usps.com/text/pub28/28apc_002.htm
  4. Dumps the result to a USPS compliant address.

This seems "higher level" than what this library is intended to do, so I'm OK if this is not desired to be merged in here. However, if you would like this to be part of this library, let me know.

Californian commented 6 years ago

@tommyjcarpenter Do you have this published anywhere? I'm interested in using it.

tommyjcarpenter commented 6 years ago

@Californian Not currently, I work for a company, and my open source license rules permit to contribute it back to the initial repo (can't just push it wherever), so I raised this issue asking whether this would be wanted.

fgregg commented 6 years ago

Hi @tommyjcarpenter that seems valuable. I've typically wanted to keep normalization as distinct from parsing, but I could be persuaded. If you've already written the code, would you be willing to make a PR. I'm not quite ready to commit.

tommyjcarpenter commented 6 years ago

@fgregg I was unable to push a feature branch:

git push origin address_assembly                                                                                                            Fri May 11 12:09:35 2018
ERROR: Permission to datamade/usaddress.git denied to tommyjcarpenter.