OpenData-NC / columbus-county-nc

OpenRural installation for Columbus County, NC
http://columbusco.openrural.org/
6 stars 1 forks source link

incorrectly parsed street names #99

Open rtburg opened 12 years ago

rtburg commented 12 years ago

Folks in Whiteville have noticed that some addresses on "North JK Powell Blvd" is displaying "Njk Powell Blvd"

Example: http://columbusco-staging.openrural.org/restaurants/detail/1136/

But in others, it appears (almost) correctly, such as: http://columbusco-staging.openrural.org/restaurants/detail/1150/

Similarly problems on South JK Powell Blvd: http://columbusco-staging.openrural.org/restaurants/detail/1170/

What's the easiest way for us to fix this?

kmtracey commented 12 years ago

Easiest would be to leave text as we get it rather than trying to fix up case. But that means just about everything WOULD BE IN UPPERCASE, ugh.

rtburg commented 12 years ago

Similar to the issue with the upper "S" after the apostrophe here? http://columbusco-staging.openrural.org/restaurants/detail/1178/

And this: http://columbusco-staging.openrural.org/restaurants/detail/1346/

which should be "John L. Riegel Rd."

Could we use the street aliases to deal with these?

kmtracey commented 12 years ago

We can easily fix the 'S problem by using string.capwords or Django's title filter instead of string_instance.title() -- I noticed that issue for restaurant names and found that oddly enough the Python string title() is oddly lacking here, and apparently it's not going to change, the ticket in the Python tracker reporting the issue (http://bugs.python.org/issue7008) was closed wontfix. (There is discussion there that capwords should be deprecated/removed as well, but that apparently hasn't happened). I used capwords for restaurant names but did not go back and correct all the places where we had used title() in the past...we probably need to do that to fix the 'S problem.

It seems a street misspelling mapping (I think that is what you mean when you ask about aliases) can help with being able to map incoming addresses like "Jl Riegel" road, but out of the box that is not helping with correcting the display of that address. When I put a mapping in place to say that the misspelling "JL RIEGEL" means "JOHN L RIEGEL", and re-run the restaurant scraper locally (after deleting all items, which seems to be required to get it to force to try re-geocoding), that June 12 inspection of International Paper goes from "This location couldn't be mapped." to one that has been successfully geocoded (overall restaurant failures went from 126 to 122, so there are apparently 4 addresses on that badly-spelled street in the restaurant data). However the newitem's location isn't updated with the "correct" name, to it's still displaying as "Jl Riegel Rd".

The J K Powell problems look to be a little harder to fix, even just for getting the geocoding to work. Those are coming in in the restaurant data as "sjk powell" and "njk powell" presumably meaning South and North J K Powell...however the North/South bit is applied on the block level only while the street mis-spelling is street level. I am able to set up a mapping of sjk powell -> J K Powell and njk powell -> J K Powell (which actually confuses me since I thought the "correct" spelling column had a unique constraint)...but the effect of that mapping on the restaurant data is mostly to turn DoesNotExist failures into AmbiguousBlock failures, since we lose the North/South designator on trying the misspelling, and apparently we need it to distinguish blocks covering the same address ranges. The overall failure count on the restaurant data does go down from 122 to 119, so there are 3 addresses where those mappings help, but I can see at least as many failures still in the list that are now "ambiguous" rather than non-existent.