WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.54k stars 1.01k forks source link

Getting error on adding new language: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128) #1106

Closed ihoru closed 7 years ago

ihoru commented 8 years ago

Steps to reproduce

  1. Add somehow new language (from web-interface or from git).
  2. Get an error:
INFO project/prod/zh_TW: processing resources/lang/i18n/zh_TW/LC_MESSAGES/messages.po, revision has changed
Traceback (most recent call last):
  File "./manage.py", line 31, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "/opt/weblate/weblate/trans/management/commands/loadpo.py", line 42, in handle
    subproject.create_translations(options['force'], langs)
  File "/opt/weblate/weblate/trans/models/subproject.py", line 1002, in create_translations
    self, lang, code, path, force, request=request
  File "/opt/weblate/weblate/trans/models/translation.py", line 70, in check_sync
    translation.check_sync(force, request=request)
  File "/opt/weblate/weblate/trans/models/translation.py", line 461, in check_sync
    self, unit, pos
  File "/opt/weblate/weblate/trans/models/unit.py", line 107, in update_from_unit
    dbunit.update_from_unit(unit, pos, created)
  File "/opt/weblate/weblate/trans/models/unit.py", line 486, in update_from_unit
    same_state=same_state
  File "/opt/weblate/weblate/trans/models/unit.py", line 748, in save
    super(Unit, self).save(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 708, in save
    force_update=force_update, update_fields=update_fields)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 736, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 820, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/base.py", line 859, in _do_insert
    using=using, raw=raw)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 122, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 1039, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 1060, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 112, in execute
    return self.cursor.execute(query, args)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 207, in execute
    args = tuple(map(db.literal, args))
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 304, in literal
    s = self.escape(o, self.encoders)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 215, in string_literal
    return db.string_literal(obj)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Actual behaviour

Process stops.

Expected behaviour

New language should be added.

Server configuration

 * Weblate weblate-2.6
 * Python 2.7.11+
 * Django 1.9.5
 * six 1.10.0
 * python-social-auth 0.2.18
 * Translate Toolkit 1.14.0-rc1
 * Whoosh 2.7.4
 * Git 2.7.4
 * Pillow (PIL) 1.1.7
 * dateutil 2.5.3
 * lxml 3.6.0
 * django-crispy-forms 1.6.0
 * compressor 1.6
 * djangorestframework 3.3.3
 * pytz 2016.4
 * pyuca N/A
 * pyLibravatar N/A
 * Mercurial 3.7.3
 * Database backends: django.db.backends.mysql
nijel commented 8 years ago

Maybe your database was not created as utf-8 one? See https://docs.weblate.org/en/latest/admin/install.html#creating-database-in-mysql

ihoru commented 8 years ago

It's UTF-8 as you can see.

MariaDB [weblate]> show create database weblate;
+----------+------------------------------------------------------------------+
| Database | Create Database                                                  |
+----------+------------------------------------------------------------------+
| weblate  | CREATE DATABASE `weblate` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [weblate]> SELECT TABLE_NAME, TABLE_COLLATION FROM information_schema.tables WHERE table_schema = DATABASE();
+--------------------------------------+-----------------+
| TABLE_NAME                           | TABLE_COLLATION |
+--------------------------------------+-----------------+
| accounts_autogroup                   | utf8_general_ci |
| accounts_profile                     | utf8_general_ci |
| accounts_profile_languages           | utf8_unicode_ci |
| accounts_profile_secondary_languages | utf8_unicode_ci |
| accounts_profile_subscriptions       | utf8_unicode_ci |
| accounts_verifiedemail               | utf8_general_ci |
| auth_group                           | utf8_general_ci |
| auth_group_permissions               | utf8_unicode_ci |
| auth_permission                      | utf8_general_ci |
| auth_user                            | utf8_general_ci |
| auth_user_groups                     | utf8_unicode_ci |
| auth_user_user_permissions           | utf8_unicode_ci |
| authtoken_token                      | utf8_general_ci |
| django_admin_log                     | utf8_general_ci |
| django_content_type                  | utf8_general_ci |
| django_migrations                    | utf8_general_ci |
| django_session                       | utf8_general_ci |
| django_site                          | utf8_general_ci |
| lang_language                        | utf8_general_ci |
| social_auth_association              | utf8_general_ci |
| social_auth_code                     | utf8_general_ci |
| social_auth_nonce                    | utf8_general_ci |
| social_auth_usersocialauth           | utf8_general_ci |
| trans_advertisement                  | utf8_general_ci |
| trans_change                         | utf8_general_ci |
| trans_check                          | utf8_general_ci |
| trans_comment                        | utf8_general_ci |
| trans_componentlist                  | utf8_general_ci |
| trans_componentlist_components       | utf8_general_ci |
| trans_dictionary                     | utf8_general_ci |
| trans_groupacl                       | utf8_general_ci |
| trans_groupacl_groups                | utf8_general_ci |
| trans_indexupdate                    | utf8_unicode_ci |
| trans_project                        | utf8_general_ci |
| trans_project_owners                 | utf8_unicode_ci |
| trans_source                         | utf8_general_ci |
| trans_subproject                     | utf8_general_ci |
| trans_suggestion                     | utf8_general_ci |
| trans_translation                    | utf8_general_ci |
| trans_unit                           | utf8_general_ci |
| trans_vote                           | utf8_unicode_ci |
| trans_whiteboardmessage              | utf8_general_ci |
+--------------------------------------+-----------------+
42 rows in set (0.00 sec)
ihoru commented 8 years ago

@nijel any further questions?

nijel commented 8 years ago

Looking at http://stackoverflow.com/questions/3715865/unicodeencodeerror-ascii-codec-cant-encode-character it might be caused by configured locales. Can you try setting utf-8 ones before starting Weblate?

export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
ihoru commented 8 years ago

I've tried all ways of solving this problem that were suggested on that page (and other on the Internet), but script still fails. :(

  1. I've check env params like this:
$ env | egrep 'LANG|LC_'
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8
LC_LANG=en_US.UTF-8
  1. I've put charset utf-8; to nginx.conf
  2. I've added this lines to /etc/uwsgi/apps-enabled/weblate.uwsgi.ini:
env             = LANG=en_US.UTF-8
env             = LC_ALL=en_US.UTF-8
env             = LC_LANG=en_US.UTF-8

And restarted all that stuff and tested it on the web.

nijel commented 8 years ago

Maybe the python-mysqldb version you are using is problematic?

ihoru commented 8 years ago
 * Weblate weblate-2.6
 * Python 2.7.11+
 * Django 1.9.5
 * six 1.10.0
 * python-social-auth 0.2.18
 * Translate Toolkit 1.14.0-rc1
 * Whoosh 2.7.4
 * Git 2.7.4
 * Pillow (PIL) 1.1.7
 * dateutil 2.5.3
 * lxml 3.6.0
 * django-crispy-forms 1.6.0
 * compressor 1.6
 * djangorestframework 3.3.3
 * pytz 2016.4
 * pyuca N/A
 * pyLibravatar N/A
 * Mercurial 3.7.3
 * Database backends: django.db.backends.mysql

$ dpkg -s python-mysqldb | grep Version
Version: 1.3.7-1build2
ihoru commented 8 years ago

This symbol is the reason of my issue: https://github.com/nijel/weblate/blob/e525ea0409b943c1cdcd1e23653bc441fdef74a8/weblate/trans/util.py#L57 PLURAL_SEPARATOR = '\x1e\x1e'

nijel commented 8 years ago

The problem is that the database driver thinks it needs to use ASCII. This might be first chars where you hit the problem, but it's going to cause problems with any non-ASCII translation....

ihoru commented 8 years ago

There are many other utf-8 symbols also (like three dots for example) and it does well. How can I make database driver to use UTF-8 instead of ASCII? It looks like this issue appeared when I moved code to another server (new: Ubuntu 16.04, old: Ubuntu 15) and imported date to MariaDB instead of MySQL previously.

2016-06-21 14:50 GMT+03:00 Michal Čihař notifications@github.com:

The problem is that the database driver thinks it needs to use ASCII. This might be first chars where you hit the problem, but it's going to cause problems with any non-ASCII translation....

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nijel/weblate/issues/1106#issuecomment-227416852, or mute the thread https://github.com/notifications/unsubscribe/AAWFtDjum4M282nOIxBX7YLi_wXGmOLdks5qN8_xgaJpZM4IaLqu .

С уважением, Игорь Поляков.

nijel commented 8 years ago

Maybe there is some issue it the MySQL library you've used on the old server, but this is valid unicode char which should not cause any problems (see http://unicode.org/cldr/utility/character.jsp?a=001E)

nijel commented 7 years ago

In the end it turned out to be MySQL Unicode issue, it's now documented at https://docs.weblate.org/en/latest/admin/install.html#unicode-issues-in-mysql