coderholic / django-cities

Countries and cities of the world for Django projects
MIT License
920 stars 374 forks source link

Import - Character Encoding #150

Closed stephenmullens closed 7 years ago

stephenmullens commented 7 years ago

Hi, I am attempting to install the django-cities project on (Win7 virtual environment, Django 1.10), but I am having issues with the import function. I get the following error during the import process

python manage.py cities --import=all

UnicodeEncodeError: 'charmap' codec can't encode character '\u0107 .....'

The problem is the u0107 characters (c-accute) in cities5000.zip. I have tried numerous builds from Github, and the standard pip install, but all seem to have this problem. Could anyone advise a workaround for this as I would like to get this app running.

I have manually edited the cities5000.zip to remove the offending characters, and now it is now complaining about \u2019 characters. I fear manually editing the data in the zip files is a loosing battle. Note that I want to import all countries into my database, and not just 'MY' which I was testing below.

Thanks Stephen

# Django-Cities
CITIES_POSTAL_CODES = ['MY']
CITIES_LOCALES = ['MY']

CITIES_PLUGINS = [
    'cities.plugin.postal_code_ca.Plugin',  # Canada postal codes need region codes remapped to match geonames
    'cities.plugin.reset_queries.Plugin',  # plugin that helps to reduce memory usage when importing large datasets (e.g. "allCountries.zip")
]
(myproject) C:\Users\Win7\Desktop\Django\province>python manage.py cities --import=all
Importing countries...:  99%|##############8| 250/252 [00:00<00:00, 600.93it/s]
Building country index: 100%|#############| 250/250 [00:00<00:00, 27776.11it/s]
Importing regions: 100%|##################| 3904/3904 [00:04<00:00, 856.65it/s]
Building region index: 100%|#############| 3904/3904 [00:03<00:00, 1040.73it/s]
Importing subregions:  21%|##9           | 8589/41116 [00:10<00:38, 838.05it/s]
    Subregion: Ville de Kisangani: Cannot find region: 09
    Subregion: Kolwezi City: Cannot find region: 05
    Subregion: Sous-Région du Haut-Shaba: Cannot find region: 05
    Subregion: Haut Lomami: Cannot find region: 05
    Subregion: Haut Katanga: Cannot find region: 05
    Subregion: Plateaux: Cannot find region: 01
    Subregion: Tumba: Cannot find region: 09
    Subregion: Kolwezi: Cannot find region: 05
    Subregion: Kananga City: Cannot find region: 03
    Subregion: Tshikapa City: Cannot find region: 03
    Subregion: Lubumbashi (city): Cannot find region: 05
    Subregion: Kalemie Ville: Cannot find region: 05
    Subregion: Kipushi Ville: Cannot find region: 05
    Subregion: Kongolo Ville: Cannot find region: 05
    Subregion: Kambove Ville: Cannot find region: 05
    Subregion: Likasi: Cannot find region: 05
    Subregion: Mutumba District: Cannot find region: 05
    Subregion: Mutumba District: Cannot find region: 03
    Subregion: Kasaji: Cannot find region: 05
    Subregion: Ville de Kananga: Cannot find region: 03
    Subregion: Manono: Cannot find region: 05
Importing subregions:  32%|####1        | 13063/41116 [00:15<00:33, 837.89it/s]
    Subregion: Gldanskiy Rayon: Cannot find region: 00
Importing subregions:  83%|##########7  | 33947/41116 [00:40<00:08, 812.42it/s]
    Subregion: Leninskiy Rayon: Cannot find region: 00
    Subregion: Komsomolabadskiy Rayon: Cannot find region: 00
    Subregion: Gissarskiy Rayon: Cannot find region: 00
Importing subregions: 100%|#############| 41116/41116 [00:48<00:00, 839.81it/s]
Building region index: 100%|############| 44995/44995 [01:14<00:00, 603.65it/s]
Importing cities:   3%|4                 | 1304/47491 [00:02<01:23, 551.64it/s]
    {}: {}: Cannot find region: {} -- skipping AW Oranjestad 00
    AW: Oranjestad: Cannot find region: 00 -- skipping
    {}: {}: Cannot find region: {} -- skipping AW Arasji 00
    AW: Arasji: Cannot find region: 00 -- skipping
    Importing cities:   4%|7                 | 1881/47491 [00:02<01:05, 700.83it/s]
    {}: {}: Cannot find region: {} -- skipping BA Traceback (most recent call last):

  File "C:\Users\Win7\Envs\myproject\lib\site-packages\cities\management\com
mands\cities.py", line 437, in import_city
    region = self.region_index[country_code + "." + region_code]
KeyError: 'BA.00'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\django\core\management\__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\django\core\management\__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\django\core\management\base.py", line 294, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\django\core\management\base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "c:\users\smullens\appdata\local\continuum\anaconda3\Lib\contextlib.py",line 30, in inner return func(*args, **kwds)
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\cities\management\commands\cities.py", line 152, in handle
    func()
  File "C:\Users\Win7\Envs\myproject\lib\site-packages\cities\management\commands\cities.py", line 443, in import_city
    print("{}: {}: Cannot find region: {} -- skipping", country_code, city.name, region_code)
  File "C:\Users\Win7\Envs\myproject\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0107' in position
6: character maps to <undefined>

(myproject) C:\Users\Win7\Desktop\Django\province>
stephenmullens commented 7 years ago

This problem is due to the windows command prompt and nothing to do with django-cities. I fixed it by running "chcp 65001" in the window first.