coderholic / django-cities

Countries and cities of the world for Django projects
MIT License
920 stars 374 forks source link

import all: 'ascii' codec can't encode character #146

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hello,

I am trying to integrate django-cities into my Django==1.10.3 app. Am working with OSX 10.12.1 and Homebrew. Database is PostgreSQL.

My problem is that when I ran manage.py cities --import=all, am presented with follwoing error:

Traceback (most recent call last):
  File "manage.py", line 12, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/django/core/management/base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/django/utils/decorators.py", line 185, in inner
    return func(*args, **kwargs)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/cities/management/commands/cities.py", line 146, in handle
    func()
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/cities/management/commands/cities.py", line 368, in import_region
    region.save()
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/cities/models.py", line 71, in save
    super(Place, self).save(*args, **kwargs)
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/cities/models.py", line 41, in save
    self.slug = slugify_func(self, self.slugify())
  File "/Users/christophe/.virtualenvs/awoopa-app/lib/python2.7/site-packages/cities/models.py", line 150, in slugify
    unicode(self.full_code())))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 9: ordinal not in range(128)

My settings.py looks like:

LANGUAGES = [
    ('cat', _('Catalan')),
    ('en', _('English')),
    ('es', _('Spanish')),
]

CITIES_LOCALES = ['LANGUAGES']
CITIES_DATA_DIR = os.path.join(BASE_DIR, 'cities')
CITIES_POSTAL_CODES = ['ES', ]
CITIES_PLUGINS = [
    'cities.plugin.reset_queries.Plugin',  # plugin that helps to reduce memory usage when importing large datasets (e.g. "allCountries.zip")
]
CITIES_IGNORE_EMPTY_REGIONS = True  # default False
CITIES_PLUGINS_RESET_QUERIES_CHANCE = 1.0 / 1000000
SPATIALITE_LIBRARY_PATH = '/usr/local/lib/mod_spatialite.dylib'

I tried removing Catalan and Spanish from LANGUAGES at first. Then I also get an error that post codes could not be validated.

Thanks

blag commented 7 years ago

Removing Catalan and Spanish from the imported locales does not guarantee that objects with unicode in their names will not be imported.

Can you figure out which objects it's choking on so I can add them as a test?


The postal codes not being validated should not be an error, it should simply be a warning. This is intentional on my part, because the formats of the postal codes from Geonames do not necessarily match the postal code format indicated by their country info. Since their data is inconsistent on that point, I didn't want users to trust that data too much.

All that really means for you is that you probably should review any postal codes you import that don't validate (they are still imported), fix them in your database, and (ideally), propose your corrections to Geonames directly. Doing so will help all data consumers of geonames data, including all users of django-cities.

ghost commented 7 years ago

Hi,

Thanks for you quick answer !

I setup logging, just see that error is in admin1CodesASCII.txt, when building regions.

2016-11-03 20:39:10,554 [INFO] cities: Downloading: admin1CodesASCII.txt
2016-11-03 20:39:10,891 [INFO] cities: Building country index
2016-11-03 20:39:10,927 [INFO] cities: Importing region data

Logging level is set to Debug, not sure how to get more details of the error.

Thanks

blag commented 7 years ago

Hmmm, apparently I need to put in more debugging hooks... 😆

Do you have anything like [DEBUG] cities: Added region: <region_code>, <region_name> in your logs?

ghost commented 7 years ago

Unfortunately not. Those are last lines of log file ...

Maybe means is already choking on first region ?

zlebnik commented 7 years ago

Looks like recent changes with slugs broke unicode support for Python 2.7 & cause troubles with importing postal codes. Postal codes: duplicate key value violates unique constraint "cities_postalcode_slug_0af4e984_uniq" DETAIL: Key (slug)=(none) already exists.

This happens because no slug is assigned to postal code during import. And, more important, it can't be assigned - it's stringified ID of postal code.

About unicode support for Python 2.7: Looks like slugify method should look like this for python2.7 support:

    def slugify(self):
        return slugify_func(
            self,
            '{}_({})'.format(
                self.name.encode('utf-8'),
                self.full_code().encode('utf-8')
            )
        )
ghost commented 7 years ago

Thanks @zlebnik !

When you talk about slugify method, where are you referencing to exactly ? I will then try update with your suggested replacement and get back.

zlebnik commented 7 years ago

@chris-y-meyers There are slugify methods in cities/models.py (in some models). You should replace calls unicode(self.foo) (where foo is not id) calls with something like foo.encode('utf-8').

It should be enough for importing locations, but you can still get stuck with postal codes. I'm currently investigating this and looking for good solution.

blag commented 7 years ago

Maybe means is already choking on first region ?

I think it is. I got it working for me (using Python 3) and pushed it, and there aren't any tests for Python 2.7 that contain unicode characters.

So...I added one. Tracking it in PR #147. Does this error look familiar?

ghost commented 7 years ago

Yes error seems to be same (build job 80.4). As zlebnik mentioned, error at end refers to slugify . I will integrate his suggestion tonight and get back.

ghost commented 7 years ago

Sorry for taking some time to get back.

Fix as @zlebnik suggested is simply to replace slugify definitions with:

def slugify(self):
        return slugify_func(self, '{}_({})'.format(
            unicode(self.name).encode('utf-8'),
            unicode(self.full_code().encode('utf-8'))))

for Region and SubRegion models.

However this leads to next issue as brought up in #148

zlebnik commented 7 years ago

@chris-y-meyers welcome back!

Yep, but you can just import this variable from conf here: https://github.com/coderholic/django-cities/blob/master/cities/management/commands/cities.py#L50 - just append it in the end

But it will still lead to slugify error with postal codes. I think, it can be fixed one minor change here: https://github.com/coderholic/django-cities/blob/master/cities/models.py#L279 - replace self.id with self.code (it should be unique anyway, I hope). Without this change you'll get some error like 'unique constraint violation: key (none) already exists'.

ghost commented 7 years ago

Thanks @zlebnik ! Yes applied your suggestions.

But still get same error as I later bring up in #148. self.code does not seem to be unique. Though looking at log file, some postal codes do seem to be imported. Looking in to it further.

ghost commented 7 years ago

This issue as well as #148 fixed in https://github.com/chris-y-meyers/django-cities.

blag commented 7 years ago

@chris-y-meyers See my comments in #148.