GoogleCloudPlatform / appengine-django-skeleton

A skeleton for creating App Engine applications using the Django framework.
BSD 3-Clause "New" or "Revised" License
134 stars 59 forks source link

(1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x80\\xF0\\x9F...' for column 'name' at row 1") #28

Open EssaAlshammri opened 7 years ago

EssaAlshammri commented 7 years ago

Hi, I can't make it to post emojis to the database when I use my-app.appspot.com but when I run it locally python manage.py runserverwith the same libraries on GAE everything works perfectly and I can post and retrieve emojis .

here is my settings .py

import os
if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):
    # Running on production App Engine, so use a Google Cloud SQL database.
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'HOST': '/cloudsql/my-app:us-central1:my-app-mysql',
            'NAME': '********',
            'USER': 'root',
            'PASSWORD': '*********',
        }
    }
else:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': '*******',
            'USER': 'root',
            'PASSWORD': '*********',
            'HOST': '**********',
            'PORT': '3306',
            'OPTIONS': {
                'charset': 'utf8mb4',
            }
        }
    }

here is the charset when using cloud shell

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8               |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8_general_ci    |
| collation_database       | utf8mb4_general_ci |
| collation_server         | utf8mb4_general_ci |
+--------------------------+--------------------+
10 rows in set (0.15 sec)

and here is the charset when I connect using the IP of the database from another client


Variable_name                        Value
character_set_client                utf8
character_set_connection            utf8mb4
character_set_database              utf8mb4
character_set_filesystem            binary
character_set_results               utf8
character_set_server                utf8mb4
character_set_system                utf8
collation_connection                utf8mb4_unicode_ci
collation_database                  utf8mb4_general_ci
collation_server                    utf8mb4_general_ci

I'm I missing something ?!!

how do I make work?

thanks

EssaAlshammri commented 7 years ago

here is my app.yaml and I'm using MySQLdb 1.2.5 locally

# [START django_app]
runtime: python27
api_version: 1
threadsafe: yes

handlers:
- url: /static
  static_dir: static/
- url: .*
  script: myapp.wsgi.application

# Only pure Python libraries can be vendored
# Python libraries that use C extensions can
# only be included if they are part of the App Engine SDK 
libraries:
- name: MySQLdb
  version: 1.2.5
- name: PIL
  version: "1.1.7"
- name: ssl
  version: latest

# [END django_app]
waprin commented 7 years ago

Thanks for report.

Sounds like you are having problems connecting to CloudSQL through the App Engine unix socket.

Nothing is jumping out as immediately obvious, but I can give it another spin. If I haven't updated you in ~24 hours, feel free to give me reminder @waprin mention.

If you see any messages in the logs in the console, might be helpful to leave them here. Make sure you select the right App Engine version in the dropdown menu in the Logging console.

EssaAlshammri commented 7 years ago

@waprin there you go :) here is the raw stack trace. last two calls

Traceback (most recent call last):
  ...............
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/MySQLdb-1.2.5/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/MySQLdb-1.2.5/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
OperationalError: (1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8E\\xF0\\x9F...' for column 'name' at row 1")
EssaAlshammri commented 7 years ago

If I haven't updated you in ~24 hours, feel free to give me reminder @waprin mention.

it's 4 hours earlier :) :laughing: @waprin have you found a solution?

waprin commented 7 years ago

@EssaAlshammri thanks for reminder.

I've since come to remember this repo is out of date and is going to be replaced. We now have CloudSQL v2. I would suggest following this tutorial

https://cloud.google.com/python/django/appengine

which references the following sample:

https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/standard/django

which is the one we're officially supporting.

I'm going to do a quick run through now and see if I repro your issue, but I highly recommend trying those other samples anyway.

waprin commented 7 years ago

@EssaAlshammri everything seems to be working fine for me.

One sanity check, does your lib folder contain MySQL dependencies? It shouldn't, but if it does that can cause some weird issues.

EssaAlshammri commented 7 years ago

@waprin

I actually have the exact same setup as the one you mentioned and I'm using second gen cloud sql from the get go.

I will give it another try with a new project from scratch. But before that, now it's giving another error message when I post emojis and I assure you I haven't done anything to the setup

OperationalError: (1267, "Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")

and I don't think I have any mysql dependencies here is an ls -la of lib folder

drwxrwxr-x  5 user user  4096 Nov 25 10:23 babel
drwxrwxr-x  2 user user  4096 Nov 25 10:23 Babel-2.3.4.dist-info
drwxrwxr-x 57 user user  4096 Nov 26 12:12 boto
drwxrwxr-x  2 user user  4096 Nov 26 12:12 boto-2.43.0.dist-info
drwxrwxr-x 18 user user  4096 Dec 18 18:32 django
drwxrwxr-x  2 user user  4096 Dec 18 18:32 Django-1.10.4.dist-info
drwxrwxr-x  2 user user  4096 Jan 22 22:47 django_google_storage
drwxrwxr-x  2 user user  4096 Nov 26 17:55 django_google_storage_updated-0.4.0.dist-info
drwxrwxr-x  2 user user  4096 Nov 25 10:23 django_phonenumber_field-1.1.0.dist-info
drwxrwxr-x  2 user user  4096 Nov 25 10:23 djangorestframework-3.5.3.dist-info
drwxrwxr-x  3 user user  4096 Feb 22 18:49 django_rest_multitokenauth
drwxrwxr-x  2 user user  4096 Feb 22 18:49 django_rest_multitokenauth-0.2.4-py2.7.egg-info
drwxrwxr-x  3 user user  4096 Nov 25 10:23 phonenumber_field
drwxrwxr-x  7 user user  4096 Nov 25 10:23 phonenumbers
drwxrwxr-x  2 user user  4096 Nov 25 10:23 phonenumberslite-7.7.5.dist-info
drwxrwxr-x  2 user user  4096 Nov 25 10:23 plivo-0.11.3.dist-info
-rw-rw-r--  1 user user 42373 Nov 25 10:23 plivo.py
-rw-rw-r--  1 user user 44791 Nov 25 10:23 plivo.pyc
-rw-rw-r--  1 user user  7704 Nov 25 10:23 plivoxml.py
-rw-rw-r--  1 user user 13362 Nov 25 10:23 plivoxml.pyc
drwxrwxr-x  2 user user  4096 Feb 22 18:35 pyfcm
drwxrwxr-x  2 user user  4096 Feb 22 18:35 pyfcm-1.2.4.dist-info
drwxrwxr-x  3 user user  4096 Nov 25 10:23 pytz
drwxrwxr-x  2 user user  4096 Nov 25 10:23 pytz-2016.7.dist-info
drwxrwxr-x  3 user user  4096 Feb 22 18:35 requests
drwxrwxr-x  2 user user  4096 Nov 25 10:23 requests-2.12.1.dist-info
drwxrwxr-x  2 user user  4096 Feb 22 18:35 requests-2.13.0.dist-info
drwxrwxr-x  9 user user  4096 Nov 25 10:23 rest_framework
EssaAlshammri commented 7 years ago

I think the problem is that connection initialization charset is not set to utf8mb4 when the app is running on the app engine

if there is a way I can set it the problem will be solved.

thing I tried so far.

if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):
    # Running on production App Engine, so use a Google Cloud SQL database.
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'HOST': '/cloudsql/myapp-app:us-central1:my-app-mysql',
            'NAME': '*******',
            'USER': 'root',
            'PASSWORD': '***********',
            'OPTIONS': {
                'charset': 'utf8mb4',
            }
        }
    } 

but this will give this error (2019, "Can't initialize character set utf8mb4 (path: /usr/local/mysql/share/charsets/)")

import os
if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):
    # Running on production App Engine, so use a Google Cloud SQL database.
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'HOST': '/cloudsql/my-app:us-central1:my-app-mysql',
            'NAME': '**********',
            'USER': 'root',
            'PASSWORD': '************',
            'OPTIONS': {
                'read_default_file': os.path.join(BASE_DIR, 'my.cnf'),
            }
        }
    }

my.cnf

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

this will also give the same error (2019, "Can't initialize character set utf8mb4 (path: /usr/local/mysql/share/charsets/)") + it doesn't event work if I put it on the local development configurations

and I also tried to set from the console screenshot from 2017-02-23 09-44-28

and no luck :-1:

waprin commented 7 years ago

:-\

This is a little more involved for me to repro hence the delay but I'll still try to follow up ,give me a bit.

EssaAlshammri commented 7 years ago

@waprin hey :)

I just wanna let you updated. I have setup everything from scratch with polls example following this https://cloud.google.com/python/django/appengine.

and nothing worked as expected. the exact same problem occurred.

waprin commented 7 years ago

@EssaAlshammri sorry, can you give me slightly clearer steps to reproduce?

1) Follow that polls example 2) Use cloudql proxy locally to show charset 3) ??

Thanks.

waprin commented 7 years ago

@ryanmats just FYI about this issue.

EssaAlshammri commented 7 years ago

@waprin here is the charset using cloudsql proxy locally

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8               |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8_general_ci    |
| collation_database       | utf8mb4_general_ci |
| collation_server         | utf8mb4_general_ci |
+--------------------------+--------------------+
10 rows in set (0.21 sec)

at the creation of the instance I set this:

thebatlab commented 7 years ago

I just ran into this same issue today. I have the database and tables set up with utf8mb4, but as soon as I add: 'OPTIONS': { 'charset': 'utf8mb4', }

I get the exact same "(2019, "Can't initialize character set utf8mb4 (path: /usr/local/mysql/share/charsets/)" error.

If I run locally, and connect to the CloudSQL db, it runs fine. It has to be the MySQLDB version App Engine uses, and not supporting that charset. From what I see here: https://code.djangoproject.com/ticket/18392#comment:12

Support for this appears to have been added to the 1.2.5 release, which it would seem app engine uses. So I'm a bit stumped.

I gave PyMySQL a try, but end up with an error of: Can't connect to MySQL server on 'localhost' ([Errno 97] Address family not supported by protocol)

So PyMySQL may not support how the connection is made, it would seem. Leaving me out of luck for supporting emojis using CloudSQL and App Engine, it would appear.

EssaAlshammri commented 7 years ago

@waprin anything yet ?

waprin commented 7 years ago

@EssaAlshammri reproduced, filed internal bug with engineering team, will keep you updated.

waprin commented 7 years ago

You can try base64 encoding going in and out of the database as a workaround.

Supposedly google.appengine.ext.django.backends.rdbms also works but I haven't gotten it working yet myself.

thebatlab commented 7 years ago

Yes, I have successfully done this by encoding/decoding in/out of the database, and it works fine. Just unfortunate to have to add the extra code :)

waprin commented 7 years ago

For sure, it's not good. The bug I filed was a duplicate and the original was already assigned to someone who's working on it so at least it's on the radar to get fixed. Unfortunately I can't promise any sort of date for when that work will be done and shipped but I will keep eye on it and let you all know.

EssaAlshammri commented 7 years ago

yeah encoding and decoding worked for me too, but :angry: it isn't that good especially if you already have records on the database < there is a workaround though (iterate over old records and encode them) one way :thinking: .

thank you so much @waprin @thebatlab

I prefer to wait for the official fix for this issue.

ghost commented 7 years ago

@waprin is there any update on this?

waprin commented 7 years ago

@rimeissner comment from internal engineer:

The standard way to get utf8mb4 working in Django is to specify it as DATABASES['default']['OPTIONS'] in settings.py, like this:

    'OPTIONS': {'charset': 'utf8mb4'},

The workaround is to manually call SET NAMES; edit lib/django/db/backends/mysql/base.py and add a conn.query("SET NAMES utf8mb4") line into DatabaseWrapper.get_new_connection, so it looks like this:

    def get_new_connection(self, conn_params):
        conn = Database.connect(**conn_params)
        conn.encoders[SafeText] = conn.encoders[six.text_type]
        conn.encoders[SafeBytes] = conn.encoders[bytes]
        conn.query("SET NAMES utf8mb4")
        return conn

Make sure that you also have utf8mb4 enabled on the backend.  The migration commands in the App Engine Django tutorial result in a Cloud SQL instance configured for utf8.  I needed to run these commands to enable utf8mb4 on the two tables:

    ALTER TABLE polls_question CONVERT TO CHARACTER SET utf8mb4;
    ALTER TABLE polls_choice CONVERT TO CHARACTER SET utf8mb4;

Let me know if that fixes your problem. There is a real fix in the pipeline but no timetable for it to be out.

thebatlab commented 7 years ago

The issue isn't getting the database or Django set up for utf8mb4, though. The issue is that the MySQL driver that app engine uses doesn't support that character set.

I could run everything just fine while on my localhost, and connected to the CloudSQL database. But as soon as it was deployed to App Engine, that is when the error comes up.

waprin commented 7 years ago

Yes, understood. The driver on GAE is outdated and needs to get fixed but in the meantime the steps above are a way to way to workaround the deficiencies of the driver by configuring utf8mb4 after the connection rather than during the connection, which is supposed to work (going to verify myself soon).

myelin commented 7 years ago

Aforementioned internal engineer here :)

TL;DR: Cloud SQL and App Engine support emojis and 4-byte UTF-8, but 'OPTIONS': {'charset': 'utf8mb4'} in your Django settings file will result in "Can't initialize character set utf8mb4".

The issue is in the C code that we use to talk to MySQL. It doesn't support utf8mb4 itself, so when it makes a connection to MySQL, it tells the server to use "UTF8", which in MySQL means UTF-8 minus the 4-byte characters... and all your emojis get mangled.

However, Python and Cloud SQL both support 4-byte UTF-8 just fine, so if you follow up with a "SET NAMES utf8mb4" command, as @waprin explained above, it'll tell the Cloud SQL server that it's safe to send 4-byte UTF-8, and everything will work.

As such, 'OPTIONS': {'charset': 'utf8mb4'} doesn't work on App Engine. To get utf8mb4 in Django:

After making these two changes, it should work on both localhost (using dev_appserver.py) and App Engine.

Likewise if you're using MySQLdb without Django, you need to initialize it like this:

conn = MySQLdb.connect(unix_socket='/cloudsql/', user='', passwd='', db='', charset='utf8') conn.execute("SET NAMES utf8mb4")

This should all stop being a problem sometime in the next few months (we have a fix, but it's waiting on a ton of stuff to get deployed and tested before it can go out).

Here's a Stack Overflow thread with some more details: http://stackoverflow.com/questions/36144026/unable-to-use-utf8mb4-character-set-with-cloudsql-on-appengine-python

thebatlab commented 7 years ago

OK, excellent, thanks for clarifying. I was worried I had described the problem inadequately and you were perhaps fixing the wrong thing :)

I misunderstood the get_new_connection fix, too. I thought it was just doing the same thing as the options flag, but at a lower level than in the settings.

thebatlab commented 7 years ago

Just wanted to poke in here and see if the final fix was deployed: "This should all stop being a problem sometime in the next few months (we have a fix, but it's waiting on a ton of stuff to get deployed and tested before it can go out)."

I did a quick test and it seems we still need the change to edit the django backend code, but wanted to confirm, in case I tested incorrectly.

thebatlab commented 6 years ago

As a final followup, this appears to be working now with no changes to Django code needed!

My database settings now has: 'OPTIONS': {'charset': 'utf8mb4'} And I changed the table I needed emoji support in to the proper character set via this command: ALTER TABLE my_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

And emojis are working just fine. I put the SQL in a migration file so it's a part of the project with no need for extra database configuration manually, so it's all seamless.

Thanks, @myelin , presuming it was you who deployed the fix :)

myelin commented 6 years ago

Huh, I must have been ignoring GitHub notifications or something; I didn't spot any of your comments until just now.

Anyway, yay, glad to see my fix finally made it out! Thanks for following up :)

an0nh4x0r commented 5 years ago

As a final followup, this appears to be working now with no changes to Django code needed!

My database settings now has: 'OPTIONS': {'charset': 'utf8mb4'} And I changed the table I needed emoji support in to the proper character set via this command: ALTER TABLE my_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

And emojis are working just fine. I put the SQL in a migration file so it's a part of the project with no need for extra database configuration manually, so it's all seamless.

Thanks, @myelin , presuming it was you who deployed the fix :)

Thanks worked perfectly.