backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

Retroactive transliteration of file names broken with MySQL 8.0.22+ #4931

Open jlfranklin opened 3 years ago

jlfranklin commented 3 years ago

Description of the bug

One test fails under Ubuntu 20.04 that does not fail under Debian 9 (php 7.0, MariaDB 10.1.47).

It may be a configuration or database issue. Database is using default char set 'utf8', not utf8mb4, and utf8mb4 is not enabled in settings.php.

Steps To Reproduce

To reproduce the behavior:

  1. Install Backdrop in an Ubuntu 20.04 server with MySQL 8.0, PHP 7.4 FPM, and Apache 2.4.41)
  2. Run the full suite of tests.

Actual behavior

---- File API: File uploading transliteration (FileUploadTransliterationTest) ----

Status    Group      Filename          Line Function                            
--------------------------------------------------------------------------------
Exception PDOExcepti system.admin.inc  2679 system_transliteration_retroactive(
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8_general_ci' cannot
    be used in conjunction with 'binary' in call to regexp_like.: SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1073 FileUploadTransliterationTest->test
    "File transliteration is not yet enabled." found
Exception PDOExcepti system.admin.inc  2679 system_transliteration_retroactive(
    SQLSTATE[HY000]: General error: 3995 Character set 'utf8_general_ci' cannot
    be used in conjunction with 'binary' in call to regexp_like.: SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array
    (
        [:db_condition_placeholder_0] => /[a-z0-9_.-]+$
    )
Fail      Other      file.test         1103 FileUploadTransliterationTest->test
    Two files available for transliteration.
Fail      Other      file.test         1104 FileUploadTransliterationTest->test
    Found the Transliterate button
Fail      Other      file.test         1104 FileUploadTransliterationTest->test
    Found the requested form fields at
Fail      Other      file.test         1105 FileUploadTransliterationTest->test
    "2 file names have been successfully transliterated." found
Fail      Other      file.test         1111 FileUploadTransliterationTest->test
    File name transliterated and lower case after bulk updating.

Additional information

Add any other information that could help, such as:

indigoxela commented 3 years ago

@jlfranklin Many thanks for reporting this issue.

My suspicion is that it has something to do with some stricter query handling in MySQL 8 (and eventually recent MariaDB). I don't think it's related to utf8-general vs. utf8-mb4.

The query looks a bit strange anyway...

SELECT
    COUNT(*) AS expression
    FROM 
    (SELECT 1 AS expression
    FROM 
    {file_managed} file_managed
    WHERE  (uri NOT REGEXP BINARY :db_condition_placeholder_0) )

I mean the NOT REGEXP BINARY part. UPDATE: which turned out to be totally OK, according to the MySQL docs, at least in select queries, but maybe not anymore in subqueries?

First of all: I suspect, not only the test, but also the functionality is broken. I can't actually test locally, as I don't have MySQL 8 here on my dev.

But you can. Steps to verify:

Is the file with camel-case name listed there? (I doubt it.)

indigoxela commented 3 years ago

Although I'm currently not able to reproduce (no MySQL 8 here), I created a PR with a slightly different syntax.

@jlfranklin does that patch change anything for your Ubuntu setup?

indigoxela commented 3 years ago

Ah, there it is: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-22.html

" Regular expression functions such as REGEXP_LIKE() yielded inconsistent results with binary string arguments. These functions now reject binary strings with an error. (Bug #31031886, Bug #98951, Bug #31031888, Bug #98950) "

A solution for both, MySQL 5 and 8 will be tricky. The older does not support the case sensitive switch within the regex (?-i), the newer refuses to compare to BINARY.

But maybe CAST or CONVERT could be used to get a case sensitive charset?

@jlfranklin can you try the latest variant from my PR, please?

klonos commented 3 years ago

I'm currently not able to reproduce (no MySQL 8 here)

I believe I can use MySQL 8 with lando on my local. I'll try this soon as I can get to it.

jlfranklin commented 3 years ago

The PR passes on MySQL 8. (Also passes on PostgreSQL, FWIW. SQLite never passed that test, so I have no expectation of it working with this PR.)

indigoxela commented 3 years ago

The PR passes on MySQL 8. (Also passes on PostgreSQL...

@jlfranklin good to know. I was a bit concerned regarding PostgreSQL.

I believe I can use MySQL 8 with lando on my local. I'll try this soon as I can get to it.

@klonos cool, many thanks! We still need to verify my suspicion that not only the test, but also the functionality is currently broken with MySQL 8 and gets fixed by the PR.

klonos commented 3 years ago

OK, I've tested this:

Is the file with camel-case name listed there? (I doubt it.)

Yes it was πŸ˜…

image

klonos commented 3 years ago

...transliteration works w/o any error btw:

image

...and the file does actually get transliterated to myfancyfile.txt

indigoxela commented 3 years ago

database: mysql:8.0 (which resulted in 8.0.19)

Many thanks for testing. :+1: Unfortunately... according to their changelog the breaking change came with MySQL 8.0.22.

@klonos I suppose, lando doesn't ship with that version yet?

klonos commented 3 years ago

They say that they do not officially support specifying patch versions, but it is possible: https://docs.lando.dev/config/mysql.html#supported-versions

Gimme a few mins to try that...

klonos commented 3 years ago

...nope. That didn't work πŸ˜… ...I need to rest, and will return to it later today/tonight.

indigoxela commented 3 years ago

...nope. That didn't work ...I need to rest, and will return to it later today/tonight.

Hey, no need to rush!

@jlfranklin should be able to verify, as he obviously has an affected version.

(The version seems to be 8.0.23-0ubuntu0.20.04.1.)

jlfranklin commented 3 years ago

Is the file with camel-case name listed there? (I doubt it.)

You called it. The list is empty on my Ubuntu 20.04 VM:

For reference:

jlfranklin commented 3 years ago

With the patch from the PR, the list is empty.

Without the patch, the transliteration page returns this:

image

indigoxela commented 3 years ago

@jlfranklin many thanks for verifying! I'll need to get a dev setup with MySQL 8.0.22+. That's not difficult, just a little time consuming - I need to setup an extra VM for that.

Yes, that Exception is the same as in the failing test "...cannot be used in conjunction with "binary" in call to regexp_like...".

indigoxela commented 3 years ago

@jlfranklin I just tested the exact same steps as in my comment with my PR branch on a VM with following setup:

And it works perfectly. The camel-case named file is available on admin/config/media/file-system/transliteration. And retroactive transliteration is fully functional.

I suspect, you missed something important in the steps. The Exception without the PR is expected, but the PR fixes it and also the case sensitive query.

The VM is still around, I can grant access to it (admin login) if requested.

jlfranklin commented 3 years ago

Lazy hand-patching on my part. My bad.

I just re-did the patch by downloading the diff itself, and it works, at least as far as showing the mixed-case filename. However, it still shows the all-lowercase filename after retroactive transliteration, and each additional transliteration run adds another _0 to the filename.

Enabling devel and turning on the query log, I see the query in question translates to:

SELECT COUNT(*) AS `expression` FROM (SELECT 1 AS `expression` FROM `file_managed` file_managed WHERE (`uri` NOT REGEXP :db_condition_placeholder_0) ) subquery

Running it by hand with the CAST placeholder returns an error if not quoted, and 1 if quoted.

mysql> SELECT COUNT(*) AS expression FROM (SELECT 1 AS expression FROM `file_managed` file_managed WHERE (uri NOT REGEXP CAST("/[a-z0-9_.-]+$" AS CHAR CHARACTER SET latin1_general_cs) )) subquery;
ERROR 1115 (42000): Unknown character set: 'latin1_general_cs'
mysql> SELECT COUNT(*) AS expression FROM (SELECT 1 AS expression FROM `file_managed` file_managed WHERE (uri NOT REGEXP 'CAST("/[a-z0-9_.-]+$" AS CHAR CHARACTER SET latin1_general_cs)' )) subquery;
+------------+
| expression |
+------------+
|          1 |
+------------+
1 row in set (0.01 sec)

mysql> select uri from file_managed;
+--------------------------+
| uri                      |
+--------------------------+
| public://myfancyfile.txt |
+--------------------------+
1 row in set (0.00 sec)
indigoxela commented 3 years ago

However, it still shows the all-lowercase filename after retroactive transliteration

Confirmed. The CAST approach seems a little too dirty, anyway. Next approach: version comparison. My PR has been updated.

Successfully tested with MySQL 8.0.23 and MariaDB 10.3.27 - needs more testing.

BTW: if anyone knows a smarter solution, please post here. :wink:

indigoxela commented 3 years ago

Some more info about the flag now used:

MariaDB calls these flags "(default)_regex_flags" and they have been introduce in version 10.0.11

MySQL calls these flags "match_type" and it's totally unclear, when they have been introduced. Probably in 8.0.x.

indigoxela commented 3 years ago

It seems that @jlfranklin needs explicit annotation to get a notification mail. :smiley:

Does the latest approach (version check) work for you?

jlfranklin commented 3 years ago

It seems that @jlfranklin needs explicit annotation to get a notification mail. smiley

A slower week at work would help, too.

Does the latest approach (version check) work for you?

It passes the tests against MySQL 5.5, MySQL 8, and PostgreSQL.9.6. As for the patch itself, I appreciate checking the driver is 'mysql' before checking the version. That makes it easier to apply to Silkscreen.

That kind of conditional is something I think belongs in the core/database/mysql code, but I don't see a way to do it without creating a new API at the database layer. This seems to be the only place in core we use a REGEXP condition. (I'm a little surprised it's not used in search.)

What do you about adding some regexpCondition() functions to the API?

indigoxela commented 3 years ago

A slower week at work would help, too.

I see... :smile:

I appreciate checking the driver is 'mysql' before checking the version. That makes it easier to apply to Silkscreen.

That was actually the intention behind it.

This seems to be the only place in core we use a REGEXP condition. ... adding some regexpCondition() functions to the API?

This is the only place, that's also my finding. Enhancing the API... - I'm not sure if it's worth the effort for a single query.

jlfranklin commented 3 years ago

This is the only place, that's also my finding. Enhancing the API... - I'm not sure if it's worth the effort for a single query.

I'm on the fence myself, leaning on the side of enhancing to keep this code clean. It's part of the SQL spec, and it takes some extra flags that are hard at best to express in a DB-agnostic way with condition(), short of pre-parsing the regex to extract the flags.

I'm OK with doing it as a follow-up issue. I'm also interested in hearing what the @backdrop/core-committers and PMC think.

indigoxela commented 3 years ago

Let's not block ourselves waiting for a Core Committer or PMC member to comment here. A member of PMC set the milestone, and I'm optimistic enough to interpret that as approval. :wink: At least no objection.

A (potential) db layer enhancement can be a follow-up, anyway. And the code will be reviewed before merging.

@jlfranklin as you opened the issue, you're the perfect candidate to verify that the PR is working properly - not only the functional test, but also the functionality. If so, please set the label "works for me". Then we can proceed with the review process.

herbdool commented 2 years ago

I think the code looks fine. Doesn't look like it's stopping us from doing future abstraction if necessary.

By the way, I was confused about the mention of testing with Postgres and Sqlite. Backdrop officially only supports MySQL (and MariaDB) so it's officially ok if it breaks other dbs.

klonos commented 2 years ago

I can finally test this with Lando now!...

For anyone else that wants to know how, please see: https://github.com/lando/lando/issues/2948#issuecomment-1003484299

klonos commented 2 years ago

I've installed a vanilla Backdrop instance, using:

I then tried the steps to reproduce provided by @indigoxela, but when I navigated to admin/config/media/file-system/transliteration (retroactive transliteration) I got the same error as @jlfranklin in https://github.com/backdrop/backdrop-issues/issues/4931#issuecomment-774064812:

PDOException: SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci'
cannot be used in conjunction with 'binary' in call to regexp_like.:
SELECT COUNT(*) AS `expression` FROM (SELECT 1 AS `expression` FROM {file_managed} file_managed
WHERE (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery;
Array ( [:db_condition_placeholder_0] => /[a-z0-9_.-]+$ ) in system_transliteration_retroactive()
(line 2683 of /app/docroot/core/modules/system/system.admin.inc).

I then destroyed the instance, and reinstalled with the changes in the PR. When I tried the steps to reproduce, I did not hit that PDOException, but the list in admin/config/media/file-system/transliteration was empty:

image

I then destroyed the instance once again, and installed again, this time with mysql 8.0.23 + the changes from the PR. This time when visiting admin/config/media/file-system/transliteration the file was listed there. Transliteration worked as expected:

image

[edit] please ignore my previous report re upload errors after deleting and trying to re-upload the transliterated file. This seems to be a pre-existing issue, and mot specific to mysql: #5424.

So to summarize:

As such, I'm setting this back to "needs work". If the empty list in admin/config/media/file-system/transliteration with mysql v8.0.22 is expected (due to the bug in mysql), then please feel free to set this back to RTBC.

indigoxela commented 2 years ago

As such, I'm setting this back to "needs work".

I'm a bit confused. @klonos you already realized that the problem you encountered is unrelated to the problem we try to fix here. And still you reject this fix? Mind to explain, why? Or am I missing the point?

Anyway, rebased. I didn't change the lables and leave it to someone else to decide.

klonos commented 2 years ago

Sorry for the confusion @indigoxela ...I've set it back to "needs work" because of this:

  • with mysql 8.0.22 and the changes from the PR applied -> no PDOException πŸ‘πŸΌ but empty list of to-be-transliterated files πŸ‘ŽπŸΌ

...but as I explained:

If the empty list in admin/config/media/file-system/transliteration with mysql v8.0.22 is expected (due to the bug in mysql), then please feel free to set this back to RTBC.

indigoxela commented 2 years ago

I tried to switch the GHA workflow to the exact version (8.0.22), but that's not supported. The currently installed 8 version with that action is 8.0.27, so I run the functional tests with that one. (Cool GHA!) Note that php 5.3 is supposed to completely fail with MySQL 8 - that's nothing to fix or even consider.

The transliteration test passes with version 8.0.27, but there's another test failing, which points to another problem with either the test or the functionality behind it: DatabaseTemporaryQueryTestCase

TBH, currently my motivation to keep on working on this PR is rather low. Even lower if I have to hunt some possible bug in an outdated MySQL version (8.0.22). The effort to install specific mysql versions is rather high for me and personally, I don't have any websites out there running on version 8.

If anyone wants to take over, please do!

I guess, DatabaseTemporaryQueryTestCase needs a new issue - that problem is unrelated here.

herbdool commented 2 years ago

@indigoxela do you agree that this issue still needs work or is it ready (and testing results as expected)? I had trouble following some of this.

I agree about making a new issue for anything else.

klonos commented 2 years ago

TBH, currently my motivation to keep on working on this PR is rather low. Even lower if I have to hunt some possible bug in an outdated MySQL version (8.0.22). The effort to install specific mysql versions is rather high for me and personally, I don't have any websites out there running on version 8.

@indigoxela it is now (relatively) easy to install any patch-level version of mysql v8.0.x with Lando, but I'm assuming that your local tooling/workflow may be different. In any case, if you do get motivated to try that, then:

  1. Try lando init and choose the backdrop recipe.
  2. Edit the resulting .lando.yml file to look like this:
    name: 4931
    # Use existing Backdrop recipe, then customise below.
    recipe: backdrop
    config:
     # Where Backdrop resides, relative to this file.
     webroot: docroot
     database: mysql:8.0.22
     php: '7.4'
     backdrush: 1.x-1.x
     config:
       database: custom-mysql.cnf
  3. Download the default mysql.cnf file for lando/backdrop from https://github.com/lando/cli/blob/main/plugins/lando-recipes/recipes/backdrop/mysql.cnf and save it as custom-mysql.cnf in the same folder as your .lando.yml file (if you choose a different name or path to save it, then edit your .lando.yml file to reflect that).
  4. Edit custom-mysql.cnf, and comment-out the lines starting with query_cache_limit and query_cache_size.
  5. Run lando destroy -y followed by lando start.

You should now have an instance with the version of mysql specified in step 2 above. To test with another version:

  1. Edit the database: mysql:8.0.22 line in your .lando.yml file.
  2. Run lando destroy -y followed by lando start.
indigoxela commented 2 years ago

@klonos many thanks for the detailed instructions, but I never used Lando, I don't even do local development (on localhost) - I use remote webservers. So still someone else should carry it over the finish line.

do you agree that this issue still needs work

@herbdool I can't actually tell, as I didn't verify the problem with version 8.0.22 (that exact version).

klonos commented 2 years ago

So still someone else should carry it over the finish line.

Fair enough πŸ‘πŸΌ

...I didn't verify the problem with version 8.0.22 (that exact version).

It was initially mentioned by @jlfranklin back in Feb.:

With the patch from the PR, the list is empty.

And I was able to confirm now.

indigoxela commented 2 years ago

I think, we should remove the milestone here. The existing PR unsuccessfully waited for review for months, and it's now obsolete anyway. Should I also close my PR?

herbdool commented 2 years ago

Fine by me.

indigoxela commented 2 years ago

Fine by me.

@herbdool Which of it? :smirk: Removing milestone or closing PR, or something else?

herbdool commented 2 years ago

Removing milestone and closing PR. Someone can always find the PR later if needed.

indigoxela commented 1 year ago

By chance I had a VM with PHP 8.1 and MySQL 8.0.31 available and did a quick test run. Still a problem.

See the verbose test result ``` Tests to be run: - File uploading transliteration (FileUploadTransliterationTest) Test run started: Tuesday, 8 November, 2022 - 4:09pm Test summary ------------ File API: File uploading transliteration (FileUploadTransliterationTest) 54 passes, 6 fails, 2 exceptions, and 17 debug messages. [4.399s] Test run duration: 4 sec Detailed test results --------------------- ---- File API: File uploading transliteration (FileUploadTransliterationTest) ---- Status Group Filename Line Function ------------------------------------------------------------------------------------------------------------------------ Exception PDOExcepti system.admin.inc 2690 SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci' cannot be used in conjunction with 'binary' in call to regexp_like.: SELECT COUNT(*) AS `expression` FROM (SELECT 1 AS `expression` FROM {file_managed} file_managed WHERE (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array ( [:db_condition_placeholder_0] => /[a-z0-9_.-]+$ ) Fail Other file.test 1073 testTransliteration() "File transliteration is not yet enabled." found Exception PDOExcepti system.admin.inc 2690 SQLSTATE[HY000]: General error: 3995 Character set 'utf8mb4_0900_ai_ci' cannot be used in conjunction with 'binary' in call to regexp_like.: SELECT COUNT(*) AS `expression` FROM (SELECT 1 AS `expression` FROM {file_managed} file_managed WHERE (`uri` NOT REGEXP BINARY :db_condition_placeholder_0) ) subquery; Array ( [:db_condition_placeholder_0] => /[a-z0-9_.-]+$ ) Fail Other file.test 1103 testTransliteration() Two files available for transliteration. Fail Other file.test 1104 testTransliteration() Found the Transliterate button Fail Other file.test 1104 testTransliteration() Found the requested form fields at Fail Other file.test 1105 testTransliteration() "2 file names have been successfully transliterated." found Fail Other file.test 1111 testTransliteration() File name transliterated and lower case after bulk updating. ```