PostgreSQL-For-Wordpress / postgresql-for-wordpress

A maintained fork of https://wordpress.org/plugins/postgresql-for-wordpress/
GNU General Public License v2.0
209 stars 68 forks source link

Testing latest version of source code and numeric types are not converted any more #109

Open hahnn opened 4 months ago

hahnn commented 4 months ago

Hello,

I had a little bit of time on this sunday and decided to start testing all the changes implemented in the last version of the source code (for v3.4).

It seems to me there are some kind of regressions if i don't do any mistake, and that's related to the transformation of all numeric types.

I take this example:

---------------------
[1710079211.6086] Error running :
CREATE TABLE IF NOT EXISTS wp_wpforms_logs (
                        id BIGINT(20) NOT NULL AUTO_INCREMENT,
                        title VARCHAR(255) NOT NULL,
                        message LONGTEXT NOT NULL,
                        types VARCHAR(255) NOT NULL,
                        create_at DATETIME NOT NULL,
                        form_id BIGINT(20),
                        entry_id BIGINT(20),
                        user_id BIGINT(20),
                        PRIMARY KEY (id)
                ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
---- converted to ----
CREATE TABLE IF NOT EXISTS wp_wpforms_logs (
                        id bigserial,
                        title VARCHAR(255) NOT NULL,
                        message text NOT NULL,
                        types VARCHAR(255) NOT NULL,
                        create_at timestamp NOT NULL,
                        form_id BIGINT(20),
                        entry_id BIGINT(20),
                        user_id BIGINT(20),
                        PRIMARY KEY (id)
                );
----> ERREUR:  erreur de syntaxe sur ou près de « ( »
LINE 7:    form_id BIGINT(20),

The BIGINT(20) has not been replaced by simply BIGINT in the create table above.

As far as I can see, in CreateTableSQLRewriter.php source code file, in public function rewrite(): string, there is a call to:

$sql = $this->rewrite_numeric_type($sql);

And my understanding is that this function is supposed to convert all numeric types as PostgreSQL known types.

However, it seems to me this function will analyze the CREATE TABLE statements to replace ONLY the numeric types with an AUTO-INCREMENT keyword. So by the way, all standard numeric types are not handled.

I think that's why in the resulting conversion, the line:

id BIGINT(20) NOT NULL AUTO_INCREMENT,

has been correctly converted to (however the NOT NULL clause disappeared whereas it should remain):

id bigserial,

But the lines:

                        form_id BIGINT(20),
                        entry_id BIGINT(20),
                        user_id BIGINT(20),

have not been converted at all and remain exactly the same, whereas they should have been converted to:

                        form_id bigint,
                        entry_id bigint,
                        user_id bigint,

So, the rewrite_numeric_type($sql) function should be able to take care of additional numeric type conversions, and not only the auto_increment ones.

Maybe adding the code below just before the return $sql; line at the end of the function would convert all what is not converted:

        // bigint
        $pattern = '/bigint(\(\d+\))?([ ]*NOT NULL)?[ ]*/i';
        preg_match($pattern, $sql, $matches);
        if($matches) {
            $sql = preg_replace($pattern, 'bigint', $sql);
        }

        // int
        $pattern = '/int(\(\d+\))?([ ]*NOT NULL)?[ ]*/i';
        preg_match($pattern, $sql, $matches);
        if($matches) {
            $sql = preg_replace($pattern, 'integer', $sql);
        }

        // smallint
        $pattern = '/smallint(\(\d+\))?([ ]*NOT NULL)?[ ]*/i';
        preg_match($pattern, $sql, $matches);
        if($matches) {
            $sql = preg_replace($pattern, 'smallint', $sql);
        }

        // Handle for numeric and decimal -- being replaced with serial
        $numeric_patterns = ['/numeric(\(\d+\))?([ ]*NOT NULL)?[ ]*/i', '/decimal(\(\d+\))?([ ]*NOT NULL)?[ ]*/i'];
        foreach($numeric_patterns as $pattern) {
            preg_match($pattern, $sql, $matches);
            if($matches) {
                $sql = preg_replace($pattern, 'integer', $sql);
            }
        }

However the addition of this code above to the function is incorrect because it will convert again what has been converted before. For example, what I mean is that when the line:

form_id BIGINT(20),

Is converted to:

form_id bigint,

then it is converted again to:

form_id biginteger,

just because the word bigint contains the word int as well...

One last thing is that the tinyint case is not treated in this function. I think there should be this piece of code somewhere:

        // tinyint
        $pattern = '/ tinyint(\(\d+\))/i';
        preg_match($pattern, $sql, $matches);
        if($matches) {
            $sql = preg_replace($pattern, ' smallint', $sql);
        }

And probably remove also this line at the very begining of the source code file (which I've commented out here below):

        // For flash-album-gallery plugin
        // ' tinyint'           => ' smallint'
    ];

What do you think?

hahnn commented 4 months ago

In addition, I see there is an issue with this piece of code in the same rewrite_numeric_type($sql) function:

        // Handle for numeric and decimal -- being replaced with serial
        $numeric_patterns = ['/numeric(\(\d+\))?([ ]*NOT NULL)?[ ]*auto_increment/i', '/decimal(\(\d+\))?([ ]*NOT NULL)?[ ]*auto_increment/i'];
        foreach($numeric_patterns as $pattern) {
            preg_match($pattern, $sql, $matches);
            if($matches) {
                $sql = preg_replace($pattern, 'serial', $sql);
            }
        }

Indeed, decimal or numeric are numeric data types that can be used for floating point numbers, or for integer numbers.

The possible syntax for numeric or decimal is:

That means the pattern which is matched when searching for them must not be limited to only something like numeric((\d+)) or decimal((\d+)), because it can potentially contains 2 numbers between parenthesis separated by a colon, and not only one number, and I'm not sure it's the case here and it's processed correctly (I've personnal issues with regexes :-D).

By the way, this means that when you get for example:

numeric(10)

It can be converted to simply:

integer

Or even bigint if the number between parenthesis is higher than what can handle an integer.

But when you get:

    numeric(10, 5)
    double(12, 10)

Then they have to be converted as:

    float
    double (for large numbers)

And on the case of floating point numbers, they cannot even be serial data types (like smallserial, serial and bigserial), so they cannot be AUTO_INCREMENT.

Finally, there is another possible option: because PostgreSQL understand numeric() and decimal() data types, maybe the best option would be to simply not convert them and keep them as they are when they contain two numbers in their definitions, but continue to convert them as smallint, integer, int4, bigint data types when they contain only one number in their definitions...

mattbucci commented 4 months ago

I guess I don't quite understand this last comment. auto increment numeric must be an integer so we can safely convert to serial or bigserial, we can ignore float types because this regex only matches auto increments.

However you are correct that we need to handle the second parameter https://regex101.com/r/uVTDpv/1

However per the postgres docs "Integers can be considered to have a scale of zero" so once again having 2 parameters implies a float value which will not happen for auto increments.

Regarding the original issue, even non incrementing instances should be handled by https://github.com/PostgreSQL-For-Wordpress/postgresql-for-wordpress/blob/c7e3fbd50bb5363c84b39a611f65b9c699c7f63f/pg4wp/rewriters/CreateTableSQLRewriter.php#L102 The regex should look like this https://regex101.com/r/tSHixS/1

The only issue i see here is that I think we're missing an "i" for case insensitive matching on the preg_replace /(" . $numeric_types_imploded . ")\(\d+\)/ should be /(" . $numeric_types_imploded . ")\(\d+\)/i

additionally it sounds like we need a stringreplacement for any types which don't exist like mediumint before any of this runs.

hahnn commented 4 months ago

About the mediumint, that's what I've exposed here: https://github.com/PostgreSQL-For-Wordpress/postgresql-for-wordpress/issues/78 about the Relevanssi plugin.