PostgreSQL-For-Wordpress / postgresql-for-wordpress

A maintained fork of https://wordpress.org/plugins/postgresql-for-wordpress/
GNU General Public License v2.0
209 stars 68 forks source link

Improve table creation #76

Closed mattbucci closed 6 months ago

mattbucci commented 8 months ago

Closes: https://github.com/PostgreSQL-For-Wordpress/postgresql-for-wordpress/issues/73

hahnn commented 8 months ago

I'm trying to investigate in the code. (I'm far from being a good PHP developer :-D)

I see this:

        // Fix auto_increment by adding a sequence
        $pattern = '/int[ ]+NOT NULL auto_increment/i';
        preg_match($pattern, $sql, $matches);
        if($matches) {
            $seq = $table . '_seq';
            $sql = str_ireplace('NOT NULL auto_increment', "NOT NULL DEFAULT nextval('$seq'::text)", $sql);
            $sql .= "\nCREATE SEQUENCE $seq;";
        }

I'm wondering if this will solve the issue about the wp_e_events SQL table for example, because the SQL code sent to PostgreSQL at this time, as seen in my audit logs, is:

CREATE TABLE wp_e_events (
                                id bigint  auto_increment primary key,
                                event_data text null,
                                created_at timestamp not null
                        )  ;;

As you can see, the line about the id column has an auto_increment directive, but there is no NOT NULL directive inside.

So maybe this line will not match the $pattern PHP line of code because it seems to me this line of code is looking for the NOT NULL directive, which potentially doesn't exist depending of how SQL tables are created.

Finally, what I mean is that in the $pattern PHP line of code, the presence of NOT NULL directive is optionnal. If you can make it optionnal, this should match this kind of cases.

Any way, in PostgreSQL, the simple fact to create a primay key on a column implies the fact the column is NOT NULL automatically. See below on my example, where I create a test table without specifying the NOT NULL directive, but when displaying the description of the table, the NOT NULL is automatically there on the column because of the existence of the primary key:

admcoquelicots=# create table test(id int4 primary key);
CREATE TABLE
admcoquelicots=# \d test
                    Table « public.test »
 Colonne |  Type   | Collationnement | NULL-able | Par défaut
---------+---------+-----------------+-----------+------------
 id      | integer |                 | not null  |
Index :
    "test_pkey" PRIMARY KEY, btree (id)

admcoquelicots=#

What do you think ?

mattbucci commented 8 months ago

@hahnn I switched to bigserial instead. it looks like the sequences still get autocreated.

image

mattbucci commented 8 months ago

Ran into an interesting issue, testing this by hand, not sure if the same issue will occur when deployed

Running the following query

         CREATE TABLE wp_itsec_vulnerabilities (
                id varchar(128) NOT NULL,
                software_type varchar(20) NOT NULL,
                software_slug varchar(255) NOT NULL,
                first_seen timestamp NOT NULL,
                last_seen timestamp NOT NULL,
                resolved_at timestamp default NULL,
                resolved_by bigint NOT NULL default 0,
                resolution varchar(20) NOT NULL default '',
                details text NOT NULL,
                PRIMARY KEY (id)
            );
        CREATE INDEX wp_itsec_vulnerabilities_resolution ON wp_itsec_vulnerabilities (resolution);
        CREATE INDEX wp_itsec_vulnerabilities_software_type ON wp_itsec_vulnerabilities (software_type);
        CREATE INDEX wp_itsec_vulnerabilities_last_seen ON wp_itsec_vulnerabilities (last_seen);

which is valid SQL gives

ERROR:  syntax error at or near "PRIMARY"
LINE 1:     PRIMARY KEY (id)
            ^ 

SQL state: 42601
Character: 5

Running first the create, then the index 1 by 1, works fine. It may be possible that the rewrite SQL commands should take an array of $SQL instead of a string, and then append to this array so that each command can be run 1 by 1 without exploding on ";" or something else fragile

hahnn commented 8 months ago
wp_itsec_vulnerabilities (
                id varchar(128) NOT NULL,
                software_type varchar(20) NOT NULL,
                software_slug varchar(255) NOT NULL,
                first_seen timestamp NOT NULL,
                last_seen timestamp NOT NULL,
                resolved_at timestamp default NULL,
                resolved_by bigint NOT NULL default 0,
                resolution varchar(20) NOT NULL default '',
                details text NOT NULL,
                PRIMARY KEY (id)
            );

About that, I made the test in my lab, and the table has been created without any issue:

lmydb=# create table mydb.wp_itsec_vulnerabilities (
lmydb(#                 id varchar(128) NOT NULL,
lmydb(#                 software_type varchar(20) NOT NULL,
lmydb(#                 software_slug varchar(255) NOT NULL,
lmydb(#                 first_seen timestamp NOT NULL,
lmydb(#                 last_seen timestamp NOT NULL,
lmydb(#                 resolved_at timestamp default NULL,
lmydb(#                 resolved_by bigint NOT NULL default 0,
lmydb(#                 resolution varchar(20) NOT NULL default '',
lmydb(#                 details text NOT NULL,
lmydb(#                 PRIMARY KEY (id)
lmydb(#             );
CREATE TABLE
lmydb=# 

So I cannot explain why you get a syntax error on your side... I'm think about a typo, or bad invisble characters you might have inside the code that generates a syntax error on DB side??? Or what you are thinking about the issue might be right as well...

hahnn commented 8 months ago

I've this issue:

---------------------
[1701075795.308] Error running :

                                        CREATE TABLE wp_statistics_visitor (
                                                ID bigint(20) NOT NULL AUTO_INCREMENT,
                                                last_counter date NOT NULL,
                                                referred text NOT NULL,
                                                agent varchar(180) NOT NULL,
                                                platform varchar(180),
                                                version varchar(180),
                                                device varchar(180),
                                                model varchar(180),
                                                UAString varchar(190),
                                                ip varchar(60) NOT NULL,
                                                location varchar(10),
                                                user_id BIGINT(40) NOT NULL,
                                                hits int(11),
                                                honeypot int(11),
                                                PRIMARY KEY  (ID),
                                                UNIQUE KEY date_ip_agent (last_counter,ip,agent(50),platform(50),version(50)),
                                                KEY agent (agent),
                                                KEY platform (platform),
                                                KEY version (version),
                                                KEY device (device),
                                                KEY model (model),
                                                KEY location (location)
                                        ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
---- converted to ----
CREATE TABLE wp_statistics_visitor (
                                         "ID" bigserial,
                                                last_counter date NOT NULL,
                                                referred text NOT NULL,
                                                agent varchar(180) NOT NULL,
                                                platform varchar(180),
                                                version varchar(180),
                                                device varchar(180),
                                                model varchar(180),
                                                UAString varchar(190),
                                                ip varchar(60) NOT NULL,
                                                location varchar(10),
                                                user_id BIGINT(40) NOT NULL,
                                                hits int,
                                                honeypot int,
                                                PRIMARY KEY  ( "ID" )
                                        )  ;
CREATE UNIQUE INDEX wp_statistics_visitor_date_ip_agent ON wp_statistics_visitor (last_counter,ip,agent,platform,version);
CREATE INDEX wp_statistics_visitor_agent ON wp_statistics_visitor (agent);
CREATE INDEX wp_statistics_visitor_platform ON wp_statistics_visitor (platform);
CREATE INDEX wp_statistics_visitor_version ON wp_statistics_visitor (version);
CREATE INDEX wp_statistics_visitor_device ON wp_statistics_visitor (device);
CREATE INDEX wp_statistics_visitor_model ON wp_statistics_visitor (model);
CREATE INDEX wp_statistics_visitor_location ON wp_statistics_visitor (location);
----> ERREUR:  erreur de syntaxe sur ou près de « ( »
LINE 13:       user_id BIGINT(40) NOT NULL,

To solve it, I've added this inside private $stringReplacements:

        'bigint(40)'    => 'bigint',

In the same order of idea, I see several SQL tables cannot be created because they have int(10) columns.

So to solve this, I've also added:

        'int(10)'    => 'int',

A suggestion here could be to implement something more robust like for any size of smallint(NB), int(NB) or bigint(NB), then convert to smallint, int or bigint.

hahnn commented 8 months ago

I've to say that with all the changes you made in your source code, lot more SQL tables can be created without issues when installing/enabling plugins.

There are still issues I'm seeing. Here is another one below, showing a KEY line composed of several columns cannot be converted as an index creation (or maybe it's because of the single quotes?). And there will be potentially another issue with this table, because it will try to create a column called date whereas date is a reserved keyword and as such it should be between double quotes :

---------------------
[1701076697.6617] Error running :

                                        CREATE TABLE wp_statistics_pages (
                                            page_id BIGINT(20) NOT NULL AUTO_INCREMENT,
                                                uri varchar(190) NOT NULL,
                                                type varchar(180) NOT NULL,
                                                date date NOT NULL,
                                                count int(11) NOT NULL,
                                                id int(11) NOT NULL,
                                                UNIQUE KEY date_2 (date,uri),
                                                KEY url (uri),
                                                KEY date (date),
                                                KEY id (id),
                                                KEY `uri` (`uri`,`count`,`id`),
                                                PRIMARY KEY (`page_id`)
                                        ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
---- converted to ----
CREATE TABLE wp_statistics_pages (
                                            page_id bigserial,
                                                uri varchar(190) NOT NULL,
                                                type varchar(180) NOT NULL,
                                                date date NOT NULL,
                                                count int NOT NULL,
                                                id int NOT NULL,
                                                KEY uri (uri,count,id),
                                                PRIMARY KEY (page_id)
                                        )  ;
CREATE UNIQUE INDEX wp_statistics_pages_date_2 ON wp_statistics_pages (date,uri);
CREATE INDEX wp_statistics_pages_url ON wp_statistics_pages (uri);
CREATE INDEX wp_statistics_pages_date ON wp_statistics_pages (date);
CREATE INDEX wp_statistics_pages_id ON wp_statistics_pages (id);
----> ERREUR:  le type « uri » n'existe pas
LINE 8:       KEY uri (uri,count,id),
                  ^

So in fact the converted result you should get should be this one:

CREATE TABLE wp_statistics_pages (
                                            page_id bigserial,
                                                uri varchar(190) NOT NULL,
                                                type varchar(180) NOT NULL,
                                                "date" date NOT NULL,
                                                count int NOT NULL,
                                                id int NOT NULL,
                                                PRIMARY KEY (page_id)
                                        )  ;
CREATE UNIQUE INDEX wp_statistics_pages_date_2 ON wp_statistics_pages (date,uri);
CREATE INDEX wp_statistics_pages_url ON wp_statistics_pages (uri);
CREATE INDEX wp_statistics_pages_date ON wp_statistics_pages (date);
CREATE INDEX wp_statistics_pages_id ON wp_statistics_pages (id);
CREATE INDEX wp_statistics_pages_uri ON wp_statistics_pages (uri,count,id);
hahnn commented 8 months ago

I would also suggest you to keep the CREATE TABLE IF NOT EXISTS instead of converting it to just CREATE TABLE: that will avoid errors on DB side because the table already exist.

Better: it would be probably nice to add the IF NOT EXISTS clause if it's not detected first in the CREATE TABLE command...

In order to do that, here below is what I did in the PHP code (added the reverse replacement with the last str_ireplace line of code):

        $sql = str_ireplace('CREATE TABLE IF NOT EXISTS ', 'CREATE TABLE ', $sql);
        $pattern = '/CREATE TABLE [`]?(\w+)[`]?/';
        preg_match($pattern, $sql, $matches);
        $table = $matches[1];
        $sql = str_ireplace('CREATE TABLE ', 'CREATE TABLE IF NOT EXISTS ', $sql);
hahnn commented 8 months ago

Another issue as you can see below:

---------------------
[1701079267.3417] Error running :

                                        CREATE TABLE wp_statistics_useronline (
                                                ID bigint(20) NOT NULL AUTO_INCREMENT,
                                                ip varchar(60) NOT NULL,
                                                created int(11),
                                                timestamp int(10) NOT NULL,
                                                date datetime NOT NULL,
                                                referred text CHARACTER SET utf8 NOT NULL,
                                                agent varchar(255) NOT NULL,
                                                platform varchar(255),
                                                version varchar(255),
                                                location varchar(10),
                                                `user_id` BIGINT(48) NOT NULL,
                                                `page_id` BIGINT(48) NOT NULL,
                                                `type` VARCHAR(100) NOT NULL,
                                                PRIMARY KEY  (ID)
                                        ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
---- converted to ----
CREATE TABLE wp_statistics_useronline (
                                         "ID" bigserial,
                                                ip varchar(60) NOT NULL,
                                                created int,
                                                timestamp int NOT NULL,
                                                date timestamp NOT NULL,
                                                referred text CHARACTER SET utf8 NOT NULL,
                                                agent varchar(255) NOT NULL,
                                                platform varchar(255),
                                                version varchar(255),
                                                location varchar(10),
                                                user_id BIGINT(48) NOT NULL,
                                                page_id BIGINT(48) NOT NULL,
                                                type VARCHAR(100) NOT NULL,
                                                PRIMARY KEY  ( "ID" )
                                        )  ;
----> ERREUR:  erreur de syntaxe sur ou près de « CHARACTER »
LINE 7:       referred text CHARACTER SET utf8 NOT NULL,

In the converted result, the column referred text CHARACTER SET utf8 NOT NULL is the issue: in PostgreSQL, CHARACTER SET utf8 must be removed, and the result for this column should be referred text NOT NULL. So I solved it by modifying PHP code from this:

        'datetime'              => 'timestamp',
        'DEFAULT CHARACTER SET utf8mb4' => '',
        'DEFAULT CHARACTER SET utf8'    => '',
        'UNSIGNED'      => '',

to this:

        'datetime'              => 'timestamp',
        'DEFAULT CHARACTER SET utf8mb4' => '',
        'DEFAULT CHARACTER SET utf8'    => '',
        'CHARACTER SET utf8'    => '',
        'UNSIGNED'      => '',

BTW, I see there are BIGINT(48) that have not been converted, so as usual, I've added this below to private $stringReplacements:

        'bigint(48)'    => 'bigint',

And again, this table has column names using reserved keywords: like timestamp or date. Then it shoud be between double quotes. At the end, the converted result should be like the one below:

CREATE TABLE wp_statistics_useronline (
                                         "ID" bigserial,
                                                ip varchar(60) NOT NULL,
                                                created int,
                                                "timestamp" int NOT NULL,
                                                "date" timestamp NOT NULL,
                                                referred text NOT NULL,
                                                agent varchar(255) NOT NULL,
                                                platform varchar(255),
                                                version varchar(255),
                                                location varchar(10),
                                                user_id bigint NOT NULL,
                                                page_id bigint NOT NULL,
                                                type VARCHAR(100) NOT NULL,
                                                PRIMARY KEY  ( "ID" )
                                        )  ;
hahnn commented 8 months ago

Another issue about a type converted from tinyint(1) to a type called tinysmallint which is not an existing type in PostgreSQL:

---------------------
[1701081979.8589] Error running :
CREATE TABLE wp_wpforms_payments (
                        id bigint(20) NOT NULL AUTO_INCREMENT,
                        form_id bigint(20) NOT NULL,
                        status varchar(10) NOT NULL DEFAULT '',
                        subtotal_amount decimal(26,8) NOT NULL DEFAULT 0,
                        discount_amount decimal(26,8) NOT NULL DEFAULT 0,
                        total_amount decimal(26,8) NOT NULL DEFAULT 0,
                        currency varchar(3) NOT NULL DEFAULT '',
                        entry_id bigint(20) NOT NULL DEFAULT 0,
                        gateway varchar(20) NOT NULL DEFAULT '',
                        type varchar(12) NOT NULL DEFAULT '',
                        mode varchar(4) NOT NULL DEFAULT '',
                        transaction_id varchar(40) NOT NULL DEFAULT '',
                        customer_id varchar(40) NOT NULL DEFAULT '',
                        subscription_id varchar(40) NOT NULL DEFAULT '',
                        subscription_status varchar(10) NOT NULL DEFAULT '',
                        title varchar(255) NOT NULL DEFAULT '',
                        date_created_gmt datetime NOT NULL,
                        date_updated_gmt datetime NOT NULL,
                        is_published tinyint(1) NOT NULL DEFAULT 1,
                        PRIMARY KEY  (id),
                        KEY form_id (form_id),
                        KEY status (status(8)),
                        KEY total_amount (total_amount),
                        KEY type (type(8)),
                        KEY transaction_id (transaction_id(32)),
                        KEY customer_id (customer_id(32)),
                        KEY subscription_id (subscription_id(32)),
                        KEY subscription_status (subscription_status(8)),
                        KEY title (title(64))
                ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
---- converted to ----
CREATE TABLE wp_wpforms_payments (
                        id bigserial,
                        form_id bigint NOT NULL,
                        status varchar(10) NOT NULL DEFAULT '',
                        subtotal_amount decimal(26,8) NOT NULL DEFAULT 0,
                        discount_amount decimal(26,8) NOT NULL DEFAULT 0,
                        total_amount decimal(26,8) NOT NULL DEFAULT 0,
                        currency varchar(3) NOT NULL DEFAULT '',
                        entry_id bigint NOT NULL DEFAULT 0,
                        gateway varchar(20) NOT NULL DEFAULT '',
                        type varchar(12) NOT NULL DEFAULT '',
                        mode varchar(4) NOT NULL DEFAULT '',
                        transaction_id varchar(40) NOT NULL DEFAULT '',
                        customer_id varchar(40) NOT NULL DEFAULT '',
                        subscription_id varchar(40) NOT NULL DEFAULT '',
                        subscription_status varchar(10) NOT NULL DEFAULT '',
                        title varchar(255) NOT NULL DEFAULT '',
                        date_created_gmt timestamp NOT NULL,
                        date_updated_gmt timestamp NOT NULL,
                        is_published tinysmallint NOT NULL DEFAULT 1,
                        PRIMARY KEY  (id)
                )  ;
CREATE INDEX wp_wpforms_payments_form_id ON wp_wpforms_payments (form_id);
CREATE INDEX wp_wpforms_payments_status ON wp_wpforms_payments (status);
CREATE INDEX wp_wpforms_payments_total_amount ON wp_wpforms_payments (total_amount);
CREATE INDEX wp_wpforms_payments_type ON wp_wpforms_payments (type);
CREATE INDEX wp_wpforms_payments_transaction_id ON wp_wpforms_payments (transaction_id);
CREATE INDEX wp_wpforms_payments_customer_id ON wp_wpforms_payments (customer_id);
CREATE INDEX wp_wpforms_payments_subscription_id ON wp_wpforms_payments (subscription_id);
CREATE INDEX wp_wpforms_payments_subscription_status ON wp_wpforms_payments (subscription_status);
CREATE INDEX wp_wpforms_payments_title ON wp_wpforms_payments (title);
----> ERREUR:  le type « tinysmallint » n'existe pas
LINE 20:    is_published tinysmallint NOT NULL DEFAULT 1,

I think this issue is existing because of the content of private $stringReplacements block of code: In fact, the PHP line below:

        'int(1)'                => 'smallint',

is before the line below:

        'tinyint(1)'    => 'smallint',

So when conversion is made, tinyint(1) matches in fact the first line which is about int(1), and that's why the converted result is wrong and become tinysmallint. An easy fix would probably be to reverse the order of PHP lines in private $stringReplacements block of code, but something more robust would be probably needed...

hahnn commented 8 months ago

Another one:

---------------------
[1701090258.5932] Error running :
CREATE TABLE `wp_yoast_migrations` (
`id` int(11) UNSIGNED auto_increment NOT NULL,
`version` varchar(191),
 PRIMARY KEY (`id`))  DEFAULT CHARSET=utf8;
---- converted to ----
CREATE TABLE wp_yoast_migrations (
id serial NOT NULL,
version varchar(191),
 PRIMARY KEY (id))  DEFAULT CHARSET=utf8;;
----> ERREUR:  erreur de syntaxe sur ou près de « DEFAULT »
LINE 4:  PRIMARY KEY (id))  DEFAULT CHARSET=utf8;;

DEFAULT CHARSET=utf8 should not be there in the converted result.

mattbucci commented 6 months ago

I should have some time to look at this later this week and will incorporate tests for the cases you've brought up, thanks.

I think I'll just add a space before int(1) so that it only matches " int(1)"

mattbucci commented 6 months ago

@hahnn I believe I've fixed all the reported issues in this PR and added tests for each of them

mattbucci commented 6 months ago

Fixed a couple more bugs, but this still isn't ready to merge. It seems like this PR changes the way the sequences get generated in a way that the code doesn't expect.

Here's the list of currently broken sequences when installing

Expected: wp_options_seq
Created: wp_options_option_id_seq

Expected: wp_users_seq
Created: wp_users_ID_seq

Expected: wp_sitemeta_seq
Created: wp_sitemeta_meta_id_seq

Expected: wp_terms_seq
Created: wp_terms_term_id_seq

Expected: wp_term_taxonomy_seq
Created: wp_term_taxonomy_term_taxonomy_id_seq

Expected: wp_posts_seq
Created: wp_posts_ID_seq
mattbucci commented 6 months ago

It's not possible to change the name of sequences created by use of bigserial and serial, I don't want to move away from using those types so that requires changing the emulation of mysqli_insert_id found here: https://github.com/PostgreSQL-For-Wordpress/postgresql-for-wordpress/blob/94259e408b5989b3bd664aaeb2ea44459d319e06/pg4wp/driver_pgsql.php#L1060

and the drop table rewriter here: https://github.com/PostgreSQL-For-Wordpress/postgresql-for-wordpress/blob/94259e408b5989b3bd664aaeb2ea44459d319e06/pg4wp/rewriters/DropTableSQLRewriter.php#L12

My current thought is to select the list of sequences from postgres and then get the one that matches our table

mattbucci commented 6 months ago

Interesting idea,

We could append RETURNING to insert statements. As in INSERT INTO my_table (column1, column2) VALUES ('value1', 'value2') RETURNING id;

We could then cache that ID instead of having to query it later. We can use

SELECT a.attname AS column_name
FROM   pg_index i
JOIN   pg_attribute a ON a.attnum = ANY(i.indkey)
WHERE  i.indrelid = 'my_table'::regclass
AND    i.indisprimary;

to select the primary key before doing insertion and this can be cached