doctrine / DoctrinePHPCRBundle

This bundle integrates Doctrine PHPCR ODM and PHPCR backends into Symfony
http://phpcr.github.com
MIT License
156 stars 66 forks source link

doctrine:phpcr init fails to install specified "charset: utf8mb4, collate: utf8mb4_unicode_ci" #310

Closed ghost closed 6 years ago

ghost commented 6 years ago

Currently, in a new Symfony4 project, with

composer show | egrep "doctrine|phpcr"
doctrine/annotations                     v1.6.0             Docblock Annotations Parser
doctrine/cache                           v1.7.1             Caching library offering an object-oriented API for many cache backends
doctrine/collections                     v1.5.0             Collections Abstraction library
doctrine/common                          v2.8.1             Common Library for Doctrine projects
doctrine/data-fixtures                   v1.3.0             Data Fixtures for all Doctrine Object Managers
doctrine/dbal                            v2.6.3             Database Abstraction Layer
doctrine/doctrine-bundle                 1.8.1              Symfony DoctrineBundle
doctrine/doctrine-cache-bundle           1.3.2              Symfony Bundle for Doctrine Cache
doctrine/doctrine-fixtures-bundle        3.0.2              Symfony DoctrineFixturesBundle
doctrine/doctrine-migrations-bundle      v1.3.1             Symfony DoctrineMigrationsBundle
doctrine/inflector                       v1.3.0             Common String Manipulations with regard to casing and singular/plural rules.
doctrine/instantiator                    1.1.0              A small, lightweight utility to instantiate objects in PHP without invoking their constructors
doctrine/lexer                           v1.0.1             Base library for a lexer that can be used in Top-Down, Recursive Descent Parsers.
doctrine/migrations                      v1.6.2             Database Schema migrations using Doctrine DBAL
doctrine/orm                             v2.6.0             Object-Relational-Mapper for PHP
doctrine/phpcr-bundle                    dev-master 5362438 Symfony DoctrinePHPCRBundle
doctrine/phpcr-odm                       dev-master f915b37 Object-Document-Mapper for PHPCR
jackalope/jackalope-doctrine-dbal        1.3.1              Jackalope Transport library for Doctrine DBAL
phpcr/phpcr                              2.1.4              PHP Content Repository interfaces
phpcr/phpcr-utils                        1.3.2              PHP Content Repository implementation independant utilities
sonata-project/doctrine-extensions       1.0.2              Doctrine2 behavioral extensions
sonata-project/doctrine-orm-admin-bundle 3.4.1              Symfony Sonata / Integrate Doctrine ORM into the SonataAdminBundle
symfony/doctrine-bridge                  v4.0.3             Symfony Doctrine Bridge

where,

mysqlshow test_sf4_phpcr
    Database: test_sf4_phpcr
        +--------+
        | Tables |
        +--------+
        +--------+

bin/console doctrine:phpcr:init:dbal --force

mysqlshow test_sf4_phpcr
    Database: test_sf4_phpcr
    +----------------------------+
    |           Tables           |
    +----------------------------+
    | phpcr_binarydata           |
    | phpcr_internal_index_types |
    | phpcr_namespaces           |
    | phpcr_nodes                |
    | phpcr_nodes_references     |
    | phpcr_nodes_weakreferences |
    | phpcr_type_childs          |
    | phpcr_type_nodes           |
    | phpcr_type_props           |
    | phpcr_workspaces           |
    +----------------------------+

noting

egrep -rlni "doctrine:phpcr:init:dbal|doctrine:phpcr:repository:init" .
    ./doctrine/phpcr-bundle/src/Command/RepositoryInitCommand.php
    ./doctrine/phpcr-bundle/src/OptionalCommand/Jackalope/InitDoctrineDbalCommand.php

immediately subsequent exec of

bin/console doctrine:phpcr:repository:init

FAILs, returning

    Successfully registered system node types.
    Executing initializer: CmfRoutingBundle

    In ObjectManager.php line 859:
      Error inside the transport layer: An exception occurred while executing 'SELECT id FROM phpcr_nodes WHERE path COLLATE utf8mb4_bin = ? AND workspace_name = ?' with params ["\/", "default"]:
      SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'

    In AbstractMySQLDriver.php line 121:
      An exception occurred while executing 'SELECT id FROM phpcr_nodes WHERE path COLLATE utf8mb4_bin = ? AND workspace_name = ?' with params ["\/", "default"]:
      SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'

    In PDOStatement.php line 107:
      SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'

    In PDOStatement.php line 105:
      SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8'

where

find . | egrep "ObjectManager.php|AbstractMySQLDriver.php|PDOStatement.php"
    ./doctrine/common/lib/Doctrine/Common/Persistence/ObjectManager.php
    ./doctrine/dbal/lib/Doctrine/DBAL/Driver/PDOStatement.php
    ./doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php
    ./jackalope/jackalope/src/Jackalope/ObjectManager.php

also noting

vendor/doctrine/dbal/UPGRADE.md

    ## Creating MySQL Tables now defaults to UTF-8

      If you are creating a new MySQL Table through the Doctrine API, charset/collate are now set to 'utf8'/'utf8_unicode_ci' by default. Previously the MySQL server defaults were used.

the current config includes,

config/packages/doctrine.yaml 
    ...
    doctrine:
        dbal:
            default_connection: dev
            connections:
                dev:
                    ...
                    driver: 'pdo_mysql'
                    server_version: '5.7'
                    charset: utf8mb4
                    default_table_options:
                        charset: utf8mb4
                        collate: utf8mb4_unicode_ci
                phpcr_dev:
                    ...
                    driver: 'pdo_mysql'
                    server_version: '5.7'
                    charset: utf8mb4
                    default_table_options:
                        charset: utf8mb4
                        collate: utf8mb4_unicode_ci                                                                                                                                              
    ...

    doctrine_phpcr:
        session:
            backend:
                type: doctrinedbal
                connection: phpcr_dev
    ...

per

vendor/doctrine/dbal/UPGRADE.md

Creating MySQL Tables now defaults to UTF-8

If you are creating a new MySQL Table through the Doctrine API, charset/collate are now set to 'utf8'/'utf8_unicode_ci' by default. Previously the MySQL server defaults were used.

using utf8mb4 has long been advised for general use in MySQL/MariaDB, and is currently 'recommended' by Symfony

http://symfony.com/doc/3.4/doctrine.html

"We recommend against MySQL's utf8 character set, since it does not support 4-byte unicode characters, and strings containing them will be truncated. This is fixed by the newer utf8mb4 character set."

per

https://www.sttmedia.com/unicode-basiclingualplane

utf8mb4 really is necessary for 'full' 4-byte unicode support of characters above BPM Plane0 -- 'utf8' is limited to 3.

This is an informative writeup:

https://mathiasbynens.be/notes/mysql-utf8mb4

Currently, the utf8mb4 problem looks like it exists only for the phpcr side of things ...

for the doctrine_phpcr clean installed tables,

    +----------------------------+
    | Tables_in_test_sf4_phpcr   |
    +----------------------------+
    | phpcr_binarydata           |
    | phpcr_internal_index_types |
    | phpcr_namespaces           |
    | phpcr_nodes                |
    | phpcr_nodes_references     |
    | phpcr_nodes_weakreferences |
    | phpcr_type_childs          |
    | phpcr_type_nodes           |
    | phpcr_type_props           |
    | phpcr_workspaces           |
    +----------------------------+
    10 rows in set (0.00 sec)

MariaDB [test_sf4_phpcr]> SHOW FULL COLUMNS FROM phpcr_binarydata;
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field          | Type         | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | id             | int(11)      | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | node_id        | int(11)      | NULL            | NO   | MUL | NULL    |                | select,insert,update,references |         |
    | property_name  | varchar(255) | utf8_unicode_ci | NO   |     | NULL    |                | select,insert,update,references |         |
    | workspace_name | varchar(255) | utf8_unicode_ci | NO   |     | NULL    |                | select,insert,update,references |         |
    | idx            | int(11)      | NULL            | NO   |     | 0       |                | select,insert,update,references |         |
    | data           | longblob     | NULL            | NO   |     | NULL    |                | select,insert,update,references |         |
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    6 rows in set (0.00 sec)

for the 'main' ORM tables

    +-------------------------------+
    | Tables_in_test_sf4            |
    +-------------------------------+
    | acl_classes                   |
    | acl_entries                   |
    | acl_object_identities         |
    | acl_object_identity_ancestors |
    | acl_security_identities       |
    | fos_group                     |
    | fos_user                      |
    | fos_user_group                |
    | fos_user_user                 |
    | fos_user_user_group           |
    +-------------------------------+
    10 rows in set (0.00 sec)

MariaDB [test_sf4]> SHOW FULL COLUMNS FROM acl_classes;
    +------------+------------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field      | Type             | Collation          | Null | Key | Default | Extra          | Privileges                      | Comment |
    +------------+------------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
    | id         | int(10) unsigned | NULL               | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | class_type | varchar(200)     | utf8mb4_unicode_ci | NO   | UNI | NULL    |                | select,insert,update,references |         |
    +------------+------------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
    2 rows in set (0.00 sec)

    MariaDB [test_sf4]>

also, somewhat confusing

MariaDB [test_sf4]> show variables like "character_set_database";
    +------------------------+---------+
    | Variable_name          | Value   |
    +------------------------+---------+
    | character_set_database | utf8mb4 |
    +------------------------+---------+

MariaDB [test_sf4]> use test_sf4_phpcr;
MariaDB [test_sf4_phpcr]> show variables like "character_set_database";
    +------------------------+---------+
    | Variable_name          | Value   |
    +------------------------+---------+
    | character_set_database | utf8mb4 |
    +------------------------+---------+
ghost commented 6 years ago

changing config to 'utf8' to verify,

config/packages/doctrine.yaml 
    doctrine:
        dbal:
            default_connection: dev
            connections:
                dev:
                    ...
                    charset: utf8
                    default_table_options:
                        charset: utf8
                        collate: utf8_unicode_ci
                phpcr_dev:
                    ...
                    charset: utf8
                    default_table_options:
                        charset: utf8
                        collate: utf8_unicode_ci
            ...

starting with clean DBs

mysqladmin --force drop test_sf4
mysqladmin --force drop test_sf4_phpcr

mysql -e "
create database test_sf4
 character set UTF8
 collate utf8_unicode_ci;
create database test_sf4_phpcr
 character set UTF8
 collate utf8_unicode_ci;
"

mysqlshow test_sf4
Database: test_sf4
    +--------+
    | Tables |
    +--------+
    +--------+
mysqlshow test_sf4_phpcr
    Database: test_sf4_phpcr
    +--------+
    | Tables |
    +--------+
    +--------+

1st dbal init

bin/console doctrine:phpcr:init:dbal --force
mysqlshow test_sf4_phpcr
    Database: test_sf4_phpcr
    +----------------------------+
    |           Tables           |
    +----------------------------+
    | phpcr_binarydata           |
    | phpcr_internal_index_types |
    | phpcr_namespaces           |
    | phpcr_nodes                |
    | phpcr_nodes_references     |
    | phpcr_nodes_weakreferences |
    | phpcr_type_childs          |
    | phpcr_type_nodes           |
    | phpcr_type_props           |
    | phpcr_workspaces           |
    +----------------------------+

then main

bin/console doctrine:schema:update --force
mysqlshow test_sf4
    Database: test_sf4
    +-------------------------------+
    |            Tables             |
    +-------------------------------+
    | acl_classes                   |
    | acl_entries                   |
    | acl_object_identities         |
    | acl_object_identity_ancestors |
    | acl_security_identities       |
    | fos_group                     |
    | fos_user                      |
    | fos_user_group                |
    | fos_user_user                 |
    | fos_user_user_group           |
    +-------------------------------+

coincidentally, this now works

bin/console doctrine:phpcr:repository:init
    Successfully registered system node types.
    Executing initializer: CmfRoutingBundle

checking columns,

use test_sf4;
SHOW FULL COLUMNS FROM acl_classes;
    +------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field      | Type             | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
    +------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | id         | int(10) unsigned | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | class_type | varchar(200)     | utf8_unicode_ci | NO   | UNI | NULL    |                | select,insert,update,references |         |
    +------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+

use test_sf4_phpcr;
SHOW FULL COLUMNS FROM phpcr_binarydata;
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | Field          | Type         | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
    | id             | int(11)      | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
    | node_id        | int(11)      | NULL            | NO   | MUL | NULL    |                | select,insert,update,references |         |
    | property_name  | varchar(255) | utf8_unicode_ci | NO   |     | NULL    |                | select,insert,update,references |         |
    | workspace_name | varchar(255) | utf8_unicode_ci | NO   |     | NULL    |                | select,insert,update,references |         |
    | idx            | int(11)      | NULL            | NO   |     | 0       |                | select,insert,update,references |         |
    | data           | longblob     | NULL            | NO   |     | NULL    |                | select,insert,update,references |         |
    +----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+

and,

use test_sf4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
    +--------------------------+--------------------+
    | Variable_name            | Value              |
    +--------------------------+--------------------+
    | character_set_client     | utf8               |
    | character_set_connection | utf8               |
    | character_set_database   | utf8               |
    | character_set_filesystem | binary             |
    | character_set_results    | utf8               |
    | character_set_server     | utf8mb4            |
    | character_set_system     | utf8               |
    | collation_connection     | utf8_general_ci    |
    | collation_database       | utf8_unicode_ci    |
    | collation_server         | utf8mb4_unicode_ci |
    +--------------------------+--------------------+

use test_sf4_phpcr;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
    +--------------------------+--------------------+
    | Variable_name            | Value              |
    +--------------------------+--------------------+
    | character_set_client     | utf8               |
    | character_set_connection | utf8               |
    | character_set_database   | utf8               |
    | character_set_filesystem | binary             |
    | character_set_results    | utf8               |
    | character_set_server     | utf8mb4            |
    | character_set_system     | utf8               |
    | collation_connection     | utf8_general_ci    |
    | collation_database       | utf8_unicode_ci    |
    | collation_server         | utf8mb4_unicode_ci |
    +--------------------------+--------------------+
alexander-schranz commented 6 years ago

The error SQLSTATE[42000]: Syntax error or access violation: 1253 COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8' should not appear if you did configure the default_table_options correctly for utf8mb4 so make sure you cleared the cache after you changed the config. But maybe you run into the too long index problem which I'm working here: https://github.com/jackalope/jackalope-doctrine-dbal/pull/353 so maybe you could test this changes @pgnd

ghost commented 6 years ago

@alexander-schranz

so make sure you cleared the cache after you changed the config

I 'always' clear cache; I'll recheck

so maybe you could test this changes

There appear to be a bunch of new releases to catch up on; I'll grab all updates and check #353 too

dbu commented 6 years ago

ftr: this is a mysql-only problem. postgres seems to have gotten their utf8 implementation right in the first attempt and thus need no extra charset for 4 byte characters. i found no confirmation if sqlite works as expected.

ghost commented 6 years ago

@dbu

It's been a recommendation in MySQL/MariaDB-land for a fairly long while.

It's about to become more of an immediate issue:

Note: in the "soon to be released" MySQL8 (https://dev.mysql.com/doc/relnotes/mysql/8.0/en/),

https://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/

"As we no longer see a strong use-case for utf8mb3, we intend to mark it as deprecated in MySQL 8.0."

http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf8-support-in-mysql-8-0/

"To summarize, our plan is to drastically improve support for utf8 by changing the default character set to utf8mb4, and add a large set of collations to charter the international user base of MySQL."

I've not yet found a written statement re: MariaDB's plans. They currently support both

https://mariadb.com/kb/en/library/supported-character-sets-and-collations/

I've asked in IRC; will see what I learn there. But I'd assume they'll follow similar suit ...

dbu commented 6 years ago

there is now a new release of jackalope-doctrine-dbal (1.3.2) that avoids creating too long fields with utf8mb4. phpcr-odm and this bundle are waiting for things to be wrapped up so i can release 2.0 with symfony 4 support.