Mailtrain-org / mailtrain

Self hosted newsletter app
GNU General Public License v3.0
5.52k stars 692 forks source link

(V2) SQL error when typing accentuated characters in some search field #928

Closed Britaliope closed 3 years ago

Britaliope commented 4 years ago

version: v2

Description

In some of the "Search" fields of the app, unicode characters (i've tried €, é, è, ç, à, §, µ ) and if one of them is typed in the search field the following error apprear:

select count(*) as `recordsFiltered` from (select `lists`.`id` from `lists` inner join `namespaces` on `namespaces`.`id` = `lists`.`namespace` inner join (select distinct `entity` from `permissions_list` where `user` = 4 and `operation` in ('view')) as `permitted__list` on `permitted__list`.`entity` = `lists`.`id` where (`lists`.`name` like '%€%' or `lists`.`cid` like '%€%' or `lists`.`subscribers` like '%€%' or `lists`.`description` like '%€%' or `namespaces`.`name` like '%€%')) as `records` limit 1 - ER_CANT_AGGREGATE_2COLLATIONS: Illegal mix of collations (ascii_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation 'like'

For some reason, the bug is not present when searching on Reports, Send configurations, Channels, or templates but it's there when searching on lists or campaigns

Mailtrain version

develop branch, last commit: commit 8dab13d903e8f19e8d69887977993e84c7768b0b (Sat Jul 25 10:24:34 2020)

Steps to reproduce

joker-x commented 4 years ago

Some fields of tables in database are in 'ascii_general_ci' collate, for example cid, reports table don't have cid field and works, lists have cid field and not work...

To see the fields with ascii collation, you can use this sql command:

SELECT TABLE_NAME,COLUMN_NAME,DATA_TYPE,COLLATION_NAME FROM information_schema.columns WHERE COLLATION_NAME = 'ascii_general_ci';
+-------------------+-------------+-----------+------------------+
| TABLE_NAME        | COLUMN_NAME | DATA_TYPE | COLLATION_NAME   |
+-------------------+-------------+-----------+------------------+
| confirmations     | cid         | varchar   | ascii_general_ci |
| lists             | cid         | varchar   | ascii_general_ci |
| settings          | key         | varchar   | ascii_general_ci |
| users             | reset_token | varchar   | ascii_general_ci |
| links             | cid         | varchar   | ascii_general_ci |
| links             | url         | varchar   | ascii_general_ci |
| campaign_messages | response_id | varchar   | ascii_general_ci |
| campaign_messages | hash_email  | char      | ascii_general_ci |
| campaign_links    | ip          | varchar   | ascii_general_ci |
| campaign_links    | country     | varchar   | ascii_general_ci |
| custom_fields     | key         | varchar   | ascii_general_ci |
| custom_fields     | type        | varchar   | ascii_general_ci |
| custom_fields     | column      | varchar   | ascii_general_ci |
| campaigns         | cid         | varchar   | ascii_general_ci |
| subscription__1   | cid         | varchar   | ascii_general_ci |
| subscription__1   | hash_email  | varchar   | ascii_general_ci |
| subscription__1   | tz          | varchar   | ascii_general_ci |
| subscription__2   | cid         | varchar   | ascii_general_ci |
| subscription__2   | hash_email  | varchar   | ascii_general_ci |
| subscription__2   | tz          | varchar   | ascii_general_ci |
| subscription__3   | cid         | varchar   | ascii_general_ci |
| subscription__3   | hash_email  | varchar   | ascii_general_ci |
| subscription__3   | tz          | varchar   | ascii_general_ci |
+-------------------+-------------+-----------+------------------+

You can change the collation by table with this command, for lists:

ALTER TABLE lists CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

To fix all tables i wrote a bash script:

#!/bin/bash

#
# Use: change_collate <actual_collate> <new_collate>
# Example: change_collate ascii_general_ci utf8mb4_general_ci
#

[ ! -f ~/.my.cnf ]  && echo "Credentials of database is needed in ~/.my.cnf" && exit 1
[ $# -ne 2 ] && echo "Two parameters are needed" && exit 2

TABLAS=$(mysql -Nse "SELECT TABLE_NAME FROM information_schema.columns WHERE COLLATION_NAME = \"$1\";")
TABLAS=$(echo $TABLAS | tr ' ' '\n' | sort | uniq)

for TABLA in $TABLAS
do
  echo "Converting from collation $1 to $2 - $TABLA"
  mysql -e "ALTER TABLE $TABLA CONVERT TO CHARACTER SET utf8mb4 COLLATE $2;"
done
joker-x commented 4 years ago

@bures any reason to use ascii character set in above fields? Could we make a database migration to solve this issue converting to utf8mb4?

bures commented 4 years ago

Just check if it still fits in the index. I adopted quite a lot of the schema from v1, so I don't really know the reason for ASCII charset. But I noticed in some occasions that sometimes there were issues with the length of the keys in the index. So maybe that was the reason. Keep in mind that going from ASCII to utf8mb4 is the key length * 4.

joker-x commented 4 years ago

Make sense, but issues with the length of the keys in utf8m4 can be solve in configuration of mysql/mariadb server and in recent versions i think works by default.

Britaliope commented 4 years ago

I've ran the script suggested by @joker-x and have no index issues