MariaDB / mariadb-docker

Docker Official Image packaging for MariaDB
https://mariadb.org
GNU General Public License v2.0
765 stars 440 forks source link

Python app works with 11.4, fails with 11.5 #604

Open lovette opened 1 month ago

lovette commented 1 month ago

I have a Python app that worked fine with Docker images 10.x through 11.4 but after it upgraded to 11.5 today, it started failing. (The error message mentions ut8mb4 so I suspect a charset issue.) I don't see anything in the Release Notes that jumps out at me as a potential cause. I tried 11.6-rc image and it fails too.

As background, it's a 10 year old codebase written in Python 2.7 and uses oursql which hasn't been updated since 2012. The app has worked with every version of MariaDB since 2014 until today 😕

Not the end of the world if I'm now stuck on 11.4, but maybe there's a release issue at play?

grooverdan commented 1 month ago

There's a default collation change in https://mariadb.com/kb/en/mariadb-11-5-1-release-notes/.

Exact error message would be useful.

Confirming its 11.4.3 that is ok?

lovette commented 1 month ago

Thanks for pointing that out.

The error itself, a Python KeyError triggered deep within oursql, is not very helpful 🙄

unknown encoding: utf8mb4

There are only a few differences of SHOW VARIABLES between 11.4.3 and 11.5.2 and the only charset related one is character_set_collations.

Version Value
11.4.3 utf8mb4=utf8mb4_uca1400_ai_ci
11.5.2 utf8mb3=utf8mb3_uca1400_ai_ci,ucs2=ucs2_uca1400_ai_ci,utf8mb4=utf8mb4_uca1400_ai_ci,utf16=utf16_uca1400_ai_ci,utf32=utf32_uca1400_ai_ci

I tried setting and changing various charset settings and nothing makes a difference 🥲 I'm happy to take any suggestions you may have, but realize this is not an image related issue and have no problem laying the blame on oursql 😁

grooverdan commented 1 month ago

oursql:

      property charset:
          """charset -> str

          Get or set the connection's current encoding. If use_unicode is 
          enabled, this is the encoding that will be used to decode incoming
          strings.
          """
          def __get__(self):
>>            self._charset = PyString_FromString( > undeclared name not builtin: PyString_Fr…
>>                mysql_character_set_name(self.conn)) > undeclared name not builtin: mysql_c…
              return self._charset
          def __set__(self, value):
              cdef char *svalue
              self._check_closed()
>>            svalue = PyString_AsString(value) > undeclared name not builtin: PyString_AsStr…
   > Storing unsafe C derivative of temporary Python reference
>>            if mysql_set_character_set(self.conn, svalue): > undeclared name not builtin: m…
                  self._raise_error()
              self._charset = value

So an error on mysql_set_character_set will raise the KeyError you experienced.

Current implementation https://mariadb.com/kb/en/mysql_set_character_set/ accepts utf8mb4 however that may not the the case for you.

The lack of backtrace means I can't see where this is coming from. Work out how to change it where it occurs.

A change in the code that hack utf8mb4 back to urf8 might be one option, or identify the source where utf8mb4 comes into the codebase.

lovette commented 1 month ago

Expanding on your observation, I see that the charset property is the value returned by mysql_character_set_name.

When I call oursql.connect I set charset=utf8 which is passed into mysql_options to set MYSQL_SET_CHARSET_NAME. In the past, this resulted in mysql_character_set_name returning utf8, but with 11.5 it returns utf8mb4. 🤨

The good news is, if I set connection.charset=utf8 again after the connection is made, oursql is happy! So perhaps setting MYSQL_SET_CHARSET_NAME is failing or it accepts setting it to utf8 but then reports it as being utf8mb4.

Not only that, but 11.5 also reports utf8mb3 for variables it used to report as utf8mb4. I sanity check a few other variables after I connect...

[WARNING] MySQL server variable 'character_set_client' is utf8mb3, expected utf8mb4
[WARNING] MySQL server variable 'character_set_connection' is utf8mb3, expected utf8mb4
[WARNING] MySQL server variable 'character_set_results' is utf8mb3, expected utf8mb4

The only references to utf8mb3 in SHOW VARIABLES is in character-set-collations, character-set-system and old-mode (which is UTF8_IS_UTF8MB3), the last two are the same as shown with 11.4.

My app sets everything to utf8mb4 from top to bottom, tables and all. These are the settings I set explicitly.

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect = 'SET COLLATION_CONNECTION = utf8mb4_unicode_ci, NAMES utf8mb4'

[client]
default-character-set=utf8mb4

Lastly, I notice the official Python 2.7 image is based on Debian 10 which includes MariaDB 10.3 client libraries (but not the client intself.) Could there be some incompatibility between the 10.3 client libraries interacting with an 11.5 server?