Open hahnn opened 1 month ago
to me this looks like utf8mb4
got trimmed down to mb4
by mistake rather than being entirely removed as it should be.
Postgres does not have utf8mb4
type, all utf8 are 4 bytes in postgres. only mysql has this oddity of 3 byte utf8
Current behavior
Mysql:
sources text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
Expected:
sources text COLLATE utf8 NOT NULL,
or less accurately
sources text NOT NULL,
Actual:
sources textmb4 COLLATE utf8mb4_bin NOT NULL,
Here's the offending line, this comes from the old code from HAWK
it would be good to remove all these lines and replace them with a generic parser of some sort
' DEFAULT CHARACTER SET utf8mb4' => '',
' DEFAULT CHARACTER SET utf8' => '',
' COLLATE utf8mb4_unicode_520_ci' => '',
' COLLATE utf8_general_ci' => '',
' CHARACTER SET utf8' => '',
' DEFAULT CHARSET=utf8' => '',
These can also be found in the Alter Table rewriter
First let's address the create table statements which are perfectly valid in mysql but don't work for postgres. utf8mb4 does not exist, neither does the utf8mb4_unicode_520_ci
DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
at best can be translated to
WITH ENCODING 'UTF8'
LC_COLLATE = 'en_US.utf8'
LC_CTYPE = 'en_US.utf8';
We can replace any instances of DEFAULT CHARACTER SET utf8.*?(?=\s|$|\n) COLLATE utf8.*?(?=\s|$|\n)
with this set of rules. Eventually someone will complain that we collate everything to en_US, so instead I think we should just remove it entirely and let users set it when creating the database.
so we can just search for DEFAULT CHARACTER SET utf8.*?(?=\s|$|\n)
Then search for COLLATE utf8.*?(?=\s|$|\n)
and remove it
Looking at our previous replacements we also need to check for =utf8 and replace it so we can search for the following variations
DEFAULT CHARACTER=utf8.*?(?=\s|$|\n)
DEFAULT CHARACTER = utf8.*?(?=\s|$|\n)
There is an issue when translating the sources column in PostgreSQL as shown below.
In this case, the sources column should keep its text data type.
That's the same issue there with the from_url column: