SwoopSearch / pyaddress

pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apartment search and apartment spider applications.
BSD 3-Clause "New" or "Revised" License
100 stars 43 forks source link

Added 9 digits zip support + cities db updates #4

Open hlorofos opened 11 years ago

hlorofos commented 11 years ago

Added 9 digits zip support in format xxxxx-xxxx Cities database has been extended by merge with National Weather Service

thetylerhayes commented 11 years ago

Is this going to get reviewed? We'd also love 9-digit zip support

hollanddd commented 11 years ago

Not sure if morally correct. I copied this library minus the fluff as i needed it for a project. In doing so I included your 9digit zips logic - I think as it is untested. The work is far from done. Feel free to lend a hand @ https://github.com/hollanddd/gladdress

joshgachnang commented 10 years ago

Totally morally fine. Code is under New BSD License. You can basically use it any way you want. Yay open source!

thetylerhayes commented 10 years ago

Thanks for responding @pcsforeducation

@hollanddd, @hlorofos: does one of you want to merge the others' commits into your forked repo?

I ask because @pcsforeducation is one of the original maintainers of this repo but is no longer with @SwoopSearch so does not have admin rights on the repo. And it sounds like @SwoopSearch itself is no more: https://twitter.com/servercobra/status/404104202273583104. So we can't merge any pull requests in this repo and it'd be nice to have someone else offer their repo up as the new active version.

thetylerhayes commented 10 years ago

Actually it looks like https://github.com/hollanddd/gladdress (mentioned earlier in the thread) already has 9-digit zip support which was the only other thing @hlorofos's fork had committed. So gladdress should work.

hollanddd commented 10 years ago

The other issues are addressed with some tests. gladdress is currently broken as i was trying to extract pre and post directionals from street names. I put it on pause until I heard from someone. I would love have a conversation and to be able to get some more eyeballs on this. Thanks @thetylerhayes @pcsforeducation

joshgachnang commented 10 years ago

I'll take a look at it over my Thanksgiving break. I like the additional tests you added. Also, I'll likely pull all the merge requests/issues to my fork of it and go from there.

I do want to rework how this works. I think something along the lines of breaking into useful tokens, applying a probability for each category to each token ("Wisconsin" is x% a state, y% part of the street name, z% of part of the city name) and guessing from there. The project as it stands right now is a lot of guess work and really only works for the US. Basically dropping each one into a bucket (state, city, etc) based on its position, what's already filled in, etc, only works for well formed addresses in a rigid order and is prone to failure if anything is missing.