furious-luke / django-address

A Django address model and field. Addresses may be specified by address components or by performing an automatic Google Maps lookup.
BSD 3-Clause "New" or "Revised" License
431 stars 180 forks source link

Lookup sometimes doesn't parse/save zipcodes #34

Closed jplehmann closed 4 years ago

jplehmann commented 8 years ago

When I have users enter an address, I'm asking for city/state/zip/country. Currently I receive these in separate fields but my plan was to only have 1 field moving forward for the address. But sometimes, the lookup appears to discard the zipcode:

This works:

s.address = "Richardson, TX 75081, USA"
<Address: Richardson, TX 75081, USA>

But this doesn't:

s.address = "Wausau, WI 54401, USA"
<Address: Wausau, WI, USA>

I've tried to speculate on why this could be, such as maybe this zipcode is not actually in Wausau (but it appears to be). However in any case, my concern is that I'd be throwing away data to only use the address field it appears. How do you recommend this be handled?

It appears that this is only a problem if the user doesn't enter a street address. I am thinking I may tell users that if the zipcode doesn't "stick" they need to provide a street address in their zipcode, and I will just not display the street part.

s.address = "2908 Christian Wausau, WI"
<Address: 2908 Christian Ave, Wausau, WI 54401, USA>
furious-luke commented 8 years ago

Well, I've done some investigation and this does indeed seem to be quite a dilemma. The issue is that Google is seemingly incorrectly geolocating this location. If I lookup this location via a webform it returns the full location details, including zip code. If I do it from a string via Python, as you've done here, it does not return the zip code.

I'll need to do some deeper investigation as to why Google doesn't want to return the zip code in some instances. I'll try and get this sorted as soon as possible, sorry for the inconvenience.

furious-luke commented 8 years ago

Okay, it looks like the issue is a result of two things:

  1. There are multiple zip codes that Wausau, WI covers.
  2. The Google geocoder seems to either ignore the request for a specific zip code, or is getting confused and just returning a blanket result for Wausau that does not include any of the possible zip codes.

This is a bit disappointing to say the least, I'd hoped that sourcing Google's geocode data would eliminate these kinds of issues, but sadly that doesn't appear to be the case.

So, I can think of a couple of possible solutions.

  1. Use a different geocoder. The issue with this is obviously that there's no guarantee other geocoders are any better.
  2. Store the raw address given to the geocoder as the formatted address. This will keep the zip code in the text, but we still won't have the zip code as a logical component of the address hierarchy, which is still not great. You could manually add it in, but it's not an ideal solution that's for sure.
furious-luke commented 8 years ago

Okay, I've implemented some changes to try and help correct this problem. I'll explain what I've done.

So, the problem really boils down to the Python version of the geocoder not doing the same thing as the JavaScript one; when we do a lookup via JS it works properly, but not from Python. What I've implemented can be broken down into two parts:

  1. I've added a raw field to the address model. This stores the unmodified value that gets passed in to the lookup routine in django-address. This way we have the unmodified value that was requested to be stored as an address, and also the value that Google thinks the address should be.
  2. I've augmented the admin view to make it easier to identify these bogus lookups and correct them using the JavaScript geocoder. In the list display in the admin you'll be able to see a new column called consistent. This just shows if the raw column matches the formatted column. This is included so you can now sort the entries by whether or not they have the value you expected them to be. Then, when you want to fix an inconsistent address, you can use the inline lookup column to quickly find the correct address and save it.

Please feel free to update the code, but please bare in mind that the migrations have changed, so you'll need to start from scratch with your original data.

And, as always, please let me know if (when) you encounter more problems. :)

furious-luke commented 8 years ago

Yep, you'll need to begin from scratch, migrating from your original data. I've made modifications to some of the earlier migrations so trying to rollback migrations won't work.

Cheers, Luke

On Tue, Jan 12, 2016 at 12:01 AM John Lehmann notifications@github.com wrote:

Having some problems here. I faked it back to the migration before my data migration (since it has no undo), then I roll back the initial django-address migration. I think this latter one shouldn't be necessary but even then, when I make migrations it picks up no extra field like I expect (raw).

What's worse, when I try my data migration now, I get:

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/operations.py",
line 13, in geolocate

component_model=self.component_model)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
336, in to_python

return to_python(*args, **kwargs)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
179, in to_python

return lookup(value, instance, address_model, component_model,
geolocator)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
215, in lookup

addr = to_python(location.raw, instance, address_model, component_model)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
186, in to_python

return _to_python(value, instance, address_model, component_model)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
117, in _to_python

roots[ii], h = _save(roots[ii], 0)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
109, in _save

obj.parent, h = _save(obj.parent, h + 1)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
109, in _save

obj.parent, h = _save(obj.parent, h + 1)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
109, in _save

obj.parent, h = _save(obj.parent, h + 1)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
109, in _save

obj.parent, h = _save(obj.parent, h + 1)

File
"/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line
113, in _save

parent=obj.parent

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/manager.py",
line 127, in manager_method

return getattr(self.get_queryset(), name)(*args, **kwargs)

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/query.py",
line 405, in get_or_create

return self.get(**lookup), False

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/query.py",
line 328, in get

num = len(clone)

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/query.py",
line 144, in __len__

self._fetch_all()

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/query.py",
line 965, in _fetch_all

self._result_cache = list(self.iterator())

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/query.py",
line 238, in iterator

results = compiler.execute_sql()

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/models/sql/compiler.py",
line 840, in execute_sql

cursor.execute(sql, params)

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/backends/utils.py",
line 79, in execute

return super(CursorDebugWrapper, self).execute(sql, params)

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/backends/utils.py",
line 59, in execute

self.db.validate_no_broken_transaction()

File

"/Users/john/.venv/mm/lib/python2.7/site-packages/django/db/backends/base/base.py",
line 327, in validate_no_broken_transaction

"An error occurred in the current transaction. You can't "

TransactionManagementError: An error occurred in the current transaction.
You can't execute queries until the end of the 'atomic' block.

On Sun, Jan 10, 2016 at 7:39 PM, Luke Hodkinson notifications@github.com wrote:

Okay, I've implemented some changes to try and help correct this problem. I'll explain what I've done.

So, the problem really boils down to the Python version of the geocoder not doing the same thing as the JavaScript one; when we do a lookup via JS it works properly, but not from Python. What I've implemented can be broken down into two parts:

1.

I've added a raw field to the address model. This stores the unmodified value that gets passed in to the lookup routine in django-address. This way we have the unmodified value that was requested to be stored as an address, and also the value that Google thinks the address should be. 2.

I've augmented the admin view to make it easier to identify these bogus lookups and correct them using the JavaScript geocoder. In the list display in the admin you'll be able to see a new column called consistent. This just shows if the raw column matches the formatted column. This is included so you can now sort the entries by whether or not they have the value you expected them to be. Then, when you want to fix an inconsistent address, you can use the inline lookup column to quickly find the correct address and save it.

Please feel free to update the code, but please bare in mind that the migrations have changed, so you'll need to start from scratch with your original data.

And, as always, please let me know if (when) you encounter more problems. :)

— Reply to this email directly or view it on GitHub < https://github.com/furious-luke/django-address/issues/34#issuecomment-170414225

.

— Reply to this email directly or view it on GitHub https://github.com/furious-luke/django-address/issues/34#issuecomment-170541737 .

jplehmann commented 8 years ago

Okay, I understand reapplying my data migration, but I need to do much more than that right?

I am trying to roll back my address all the way but it gets stuck at version 3. However, it looks like the raw column gets added in address migration 1, and I cannot move forward without applying it.

Do I need to manual drop the address table?

jplehmann commented 8 years ago

So I manually dropped all address tables and faked back to 0 to get started afresh. I then used the new code version, and migrated. To test I then tried to go back to zero again. I still got stuck at version 3 which is irreversible, but this time after faking it to 2 I was able to then migrate to zero without having to drop anything. Somehow my schema had gotten out of sync with the code (and I tried different code versions).

Moving forward, do we anticipate many more schema changes that will be backward incompatible -- can we not apply them as additional migrations as opposed to modifying the earlier ones already applied? This was all on my development instance, and I don't want to be doing a lot of this in production.

Thanks

jplehmann commented 8 years ago
  1. Question from above, about how future migrations will work.
  2. Issue: This may have already been a problem, but just noticed that in my form, if the user edits an object which already had an address set, so the form is pre-populated with that a good value, when the user clicks Save it acts as though they did not enter a value at all and fails validation.
  3. Feedback: Consistent flag seems to be of low value just because there are so many things that can set it off, such as a missing comma or "United States" versus "USA". However it might be helpful in more quickly identifying the objects that I am converting now versus the ones in the future which will only ever store a "consistent" value. What I do find useful is being able to see where formatted is empty.
  4. Minor: I notice that every time an unresolved address is entered, it creates a new row, even if the text is duplicate. This could result in a lot of garbage over time, but I guess it can be cleaned up by deleting any addresses not referenced by the application.
furious-luke commented 8 years ago

1.

That sort of out-of-sync behavior will only happen if I should need to make a change to a past migration. This is pretty rare, the only reason I did this time was because without it we would have lost all the raw values after the conversion from your data to django-address. Now that we have the raw values included I shouldn’t need to make any past migration alterations. 2.

Ah yes, I know what’s up there. 3.

Yeah, I think you’re right about that. It’s usefulness is pretty much limited to future additions. I have an idea I’d like to try to make it more useful, though. I’ll implement a better matching scheme so that small deviations from Google’s calculated value don’t register as inconsistent. This would hopefully reduce the set of inconsistent addresses to just those that have actually failed to be looked up correctly. 4.

That sounds like a bug to me. I’ll get on to it today.

On Tue, Jan 12, 2016 at 1:56 AM John Lehmann notifications@github.com http://mailto:notifications@github.com wrote:

1.

Question from above, about how future migrations will work. 2.

Issue: This may have already been a problem, but just noticed that in my form, if the user edits an object which already had an address set, so the form is pre-populated with that a good value, when the user clicks Save it acts as though they did not enter a value at all and fails validation. 3.

Feedback: Consistent flag seems to be of low value just because there are so many things that can set it off, such as a missing comma or "United States" versus "USA". However it might be helpful in more quickly identifying the objects that I am converting now versus the ones in the future which will only ever store a "consistent" value. What I do find useful is being able to see where formatted is empty. 4.

Minor: I notice that every time an unresolved address is entered, it creates a new row, even if the text is duplicate. This could result in a lot of garbage over time, but I guess it can be cleaned up by deleting any addresses not referenced by the application.

— Reply to this email directly or view it on GitHub https://github.com/furious-luke/django-address/issues/34#issuecomment-170578695 .

furious-luke commented 8 years ago

Hi John,

Okay, I've made some changes to the way the consistency value is calculated. Hopefully now it's a bit more robust (although to be honest I'm not sure how effective it's going to be without a real dataset to test on).

In addition, the issue from 2. should be fixed now.

Cheers, Luke

On Tue, Jan 12, 2016 at 9:06 AM Luke Hodkinson furious.luke@gmail.com wrote:

1.

That sort of out-of-sync behavior will only happen if I should need to make a change to a past migration. This is pretty rare, the only reason I did this time was because without it we would have lost all the raw values after the conversion from your data to django-address. Now that we have the raw values included I shouldn’t need to make any past migration alterations. 2.

Ah yes, I know what’s up there. 3.

Yeah, I think you’re right about that. It’s usefulness is pretty much limited to future additions. I have an idea I’d like to try to make it more useful, though. I’ll implement a better matching scheme so that small deviations from Google’s calculated value don’t register as inconsistent. This would hopefully reduce the set of inconsistent addresses to just those that have actually failed to be looked up correctly. 4.

That sounds like a bug to me. I’ll get on to it today.

On Tue, Jan 12, 2016 at 1:56 AM John Lehmann notifications@github.com http://mailto:notifications@github.com wrote:

1.

Question from above, about how future migrations will work. 2.

Issue: This may have already been a problem, but just noticed that in my form, if the user edits an object which already had an address set, so the form is pre-populated with that a good value, when the user clicks Save it acts as though they did not enter a value at all and fails validation. 3.

Feedback: Consistent flag seems to be of low value just because there are so many things that can set it off, such as a missing comma or "United States" versus "USA". However it might be helpful in more quickly identifying the objects that I am converting now versus the ones in the future which will only ever store a "consistent" value. What I do find useful is being able to see where formatted is empty. 4.

Minor: I notice that every time an unresolved address is entered, it creates a new row, even if the text is duplicate. This could result in a lot of garbage over time, but I guess it can be cleaned up by deleting any addresses not referenced by the application.

— Reply to this email directly or view it on GitHub https://github.com/furious-luke/django-address/issues/34#issuecomment-170578695 .

jplehmann commented 8 years ago

Luke,

Using version 8b1419ea3ed69c5a60cba127cbbbded8914791ec:

  1. Make migrations crashes with:
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/core/management/__init__.py", line 312, in execute
    django.setup()
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/__init__.py", line 18, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/apps/registry.py", line 108, in populate
    app_config.import_models(all_models)
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/apps/config.py", line 198, in import_models
    self.models_module = import_module(models_module_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line 19, in <module>
    from .consistency import get_consistency_from_parts
  File "/Users/john/.venv/mm/lib/python2.7/site-packages/address/consistency.py", line 4, in <module>
    PUNC_TABLE = str.maketrans(dict([(c, None) for c in string.punctuation]))
AttributeError: type object 'str' has no attribute 'maketrans'
  1. Still having the problem with form mentioned above. If I never change the value from the loaded value, it acts as though that field were blank. I also noticed these attempts are adding duplicate entries into the address table (raw only).

thanks, John

furious-luke commented 8 years ago

I’m guessing you’re using Python 2? I’ve just pushed some changes to make the code more backwards compatible, if you still have trouble migrating I’ll install Python 2 and make sure it’s working locally.

By the way, I forgot to mention that after successfully migrating you’ll need to run ./manage.py checkconsistency to recalculate the consistency values. Then take a look in the admin and see if the number of spurious results has decreased to a reasonable number.

As for the form issue, I’ve just pushed some more fixes that should hopefully correct the duplicates.

Cheers, Luke

On Tue, Jan 12, 2016 at 11:41 PM John Lehmann notifications@github.com http://mailto:notifications@github.com wrote:

Luke,

Using version 8b1419ea3ed69c5a60cba127cbbbded8914791ec:

  1. Make migrations crashes with:

    File "./manage.py", line 10, in execute_from_command_line(sys.argv) File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/core/management/init.py", line 338, in execute_from_command_line utility.execute() File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/core/management/init.py", line 312, in execute django.setup() File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/init.py", line 18, in setup apps.populate(settings.INSTALLED_APPS) File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/apps/registry.py", line 108, in populate app_config.import_models(all_models) File "/Users/john/.venv/mm/lib/python2.7/site-packages/django/apps/config.py", line 198, in import_models self.models_module = import_module(models_module_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/init.py", line 37, in import_module import(name) File "/Users/john/.venv/mm/lib/python2.7/site-packages/address/models.py", line 19, in from .consistency import get_consistency_from_parts File "/Users/john/.venv/mm/lib/python2.7/site-packages/address/consistency.py", line 4, in PUNC_TABLE = str.maketrans(dict([(c, None) for c in string.punctuation])) AttributeError: type object 'str' has no attribute 'maketrans'

  2. Still having the problem with form mentioned above. If I never change the value from the loaded value, it acts as though that field were blank. I also noticed these attempts are adding duplicate entries into the address table (raw only).

thanks, John

— Reply to this email directly or view it on GitHub https://github.com/furious-luke/django-address/issues/34#issuecomment-170899813 .

jplehmann commented 8 years ago
furious-luke commented 8 years ago

Hmm, that's interesting about the consistency check. I may have accidentally broken something.

Do you think it would be okay for you to send me a few addresses that are representative of the consistency problem? I.e. have the same value for raw and formatted, or are almost the same. It would be very helpful in diagnosing what's going wrong.

Cheers, Luke

On Wed, 13 Jan 2016 12:38 am John Lehmann notifications@github.com wrote:

  • Make migrations is now fixed. Yes using Python 2.7.
  • Form is now fixed -- yae! This was my blocker.
  • Although on the form when I enter an address without autocompleting it (like "usa"), it still adds duplicate rows in the address table. This is not a blocker.
  • I cannot tell that consistency has gotten any better, maybe even worse. Of about 30 addresses only 1 of them is marked consistent, and some are even verbatim the same text in the two columns in admin. And I did run the command.

— Reply to this email directly or view it on GitHub https://github.com/furious-luke/django-address/issues/34#issuecomment-170913830 .

furious-luke commented 8 years ago

So I found a pretty big problem with the consistency calculation and have since fixed it and pushed the changes. If you get an opportunity to test the consistency stuff again then please let me know how it goes.

banagale commented 4 years ago

I'm closing this issue as the main concerns originally raised seem to have been addressed .