EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
523 stars 161 forks source link

Data missing from FDW table #205

Closed mrAndersen closed 2 years ago

mrAndersen commented 3 years ago

Hello there, i have very strange problem with mysql_fdw PostgreSQL extension. Some data from postgres side is missing.

Mysql side table:

(
    id int auto_increment
        primary key,
    partner int null,
    title varchar(255) null,
    text longtext null,
    constraint FK_D8698A76312B3E16
        foreign key (partner) references database.partner (id)
)
collate=utf8_unicode_ci;

create index IDX_D8698A76312B3E16
    on database.document (partner);

Generated with mysql_fdw table on postgres side:

(
    id integer not null,
    partner integer,
    title varchar(255),
    text text
)
server old_server
options (dbname 'database', table_name 'document');

alter foreign table old.document owner to postgres;

After doing select * from old.document, i am getting only 5 records, however original table on mysql side consists of at least 60 records.

Mysql version is 5.7.31, Postges version is 11.8

What am i doing wrong?

surajkharage19 commented 3 years ago

Hi,

I am not able to reproduce this issue on latest HEAD (8912bbb4e2ca53e8fe72c2a4b4b76dc8ef778f09), tried below steps:

Environment details:

OS: CentOS 7 PG version: 11.8 mysql version: 5.6.49

MySQL side:

-- Created partner table so that we can create foreign key.
create table partner(id int primary key);

-- Inserted values 1 to 10 like below in partner table
insert into partner values(1);  

mysql> select count(*) from partner;
+----------+
| count(*) |
+----------+
|       10 |
+----------+

-- Created document table like the same way.
create table document (
    id int auto_increment primary key,
    partner int null,
    title varchar(255) null,
    text longtext null,
    constraint FK_D8698A76312B3E16 foreign key (partner) references partner (id))
collate=utf8_unicode_ci;

-- Created index
create index IDX_D8698A76312B3E16 on document (partner);

-- Inserted some random number of rows in document table like below:    
insert into document(partner, title, text) values(1,'abc', 'abc');
mysql> select count(*) from document;
+----------+
| count(*) |
+----------+
|       13 |
+----------+
1 row in set (0.02 sec)

Postgres side:

-- Created the foreign table as provided by you.    
postgres@101109=#CREATE FOREIGN TABLE mysql_document(id integer not null, partner integer, title varchar(255), text text) server mysql_svr options (dbname 'suraj', table_name 'document');
CREATE FOREIGN TABLE

postgres@101109=#alter foreign table mysql_document owner to postgres;
ALTER FOREIGN TABLE

postgres@101109=#select count(*) from mysql_document ;
 count 
-------
    13
(1 row)

postgres@101109=#select * from mysql_document ;
 id | partner | title | text 
----+---------+-------+------
  1 |       1 | abc   | abc
  2 |       1 | abc   | abc
  3 |       2 | abc   | abc
  4 |      10 | abc   | abc
  6 |       8 | abc   | abc
  7 |       6 | abc   | abc
  8 |       4 | abc   | abc
  9 |       4 | abc   | abc
 10 |       4 | abc   | abc
 11 |       4 | abc   | abc
 12 |       7 | abc   | abc
 13 |       9 | abc   | abc
 14 |       5 | abc   | abc
(13 rows)

postgres@101109=# \l+ postgres
                                                                List of databases
   Name   | Owner  | Encoding |   Collate   |    Ctype    | Access privileges |  Size   | Tablespace |                Description                 
----------+--------+----------+-------------+-------------+-------------------+---------+------------+--------------------------------------------
 postgres | centos | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                   | 7821 kB | pg_default | default administrative connection database
(1 row)

Can you please share below additional information to troubleshoot this further:

1: Exact OS version 2: What was mysql_fdw HEAD commit on which you tested this scenario? 3: If possible, share some problematic row sample data from your end. 4: In the last step, I can see you alter the foreign table owner to postgres, what was the earlier owner for that table? Whether postgres is a superuser? 5: It might be a collation issue, can you please share the collation details from both the databases?

mrAndersen commented 3 years ago

@surajkharage19

  1. Postgres server running in docker from default docker image on host which is arch-linux 5.8.3-6-tkg-pds. Mysql server is running on Ubuntu 18.04, 4.15.0-47-generic

  2. Can't tell exact HEAD, pg_available_extensions says that this is version 1.1

  3. Unfortunately i can't share row samples, but there are some html code there, not too complex, around 100 Kb max. Problematic rows and non-problematic one have similar html code inside, can't say they are very different somehow.

  4. postgres is superuser, i am altering to this user to be able to do selects, as far as i know, it won't work unless

  5. Strangely

    SELECT
    default_character_set_name,
    default_collation_name
    FROM information_schema.schemata
    WHERE schema_name = 'database';

    on mysql side shows me latin1, latin1_swedish_ci, which is strange, but

SHOW TABLE STATUS LIKE '%document%';

shows me utf8_unicode_ci. Maybe the problem is here. Postgres side

SELECT datcollate AS collation
FROM pg_database
WHERE datname = current_database();

shows en_US.utf8

Up.

Changed mysql side database-wide collation to utf8, re-exported, nothing changed

mrAndersen commented 3 years ago

Up.

mysql_fdw installed from this deb package https://packages.debian.org/buster/postgresql-11-mysql-fdw

mrAndersen commented 3 years ago

Up.

This can be closed... i checked psql CLI client - it is returning correct number of rows, problem is in my IDE database client (Phpstorm's default plugin) AND in addition, PDO driver inside php is not returning data correctly too, so this is definitely not mysql_fdw problem, thx for help!

mrAndersen commented 3 years ago

Up.

Story continues, if i do select id, title from old - i am getting all rows from all sql clients (psql too), if i do select id, title, text from old, i am getting only 6 rows, so maybe this is mysql_fdw issue, lol. The problems is clearly in text field contents...

surajkharage19 commented 3 years ago

Thanks @mrAndersen for the update on this. Just wanted to clarify one thing. In your last update, you mentioned that "if i do select id, title, text from old, i am getting only 6 rows", does that mean, on psql client also you are getting the incorrect result? or it is a different database client?

mrAndersen commented 3 years ago

@surajkharage19 yes, on every client i am getting incorrect results if i am adding that text column containing html into select statement. I will try to remove sensitive information from some of problematic rows and post it here, so u guys can investigate

mrAndersen commented 3 years ago

Up.

I've found problematic row(rows), after removing them i am getting more results. After removing first - i am getting additional 7 results, after removing next one, i am getting additional 9 results, which means that the problem is clearly in contents of that rows. If i could somehow help analyzing that data, - just say how.

surajkharage19 commented 3 years ago

Thanks for the update. It would be really good if you share some data from problematic rows after removing your sensitive data to reproduce the same at my end. I could not reproduce it with normal HTML data.

mrAndersen commented 3 years ago

https://file.io/SuYv6HE7blmW

@surajkharage19 I've removed sensitive data, rechecked, and row with this data in it doesn't appear in select statements.

surajkharage19 commented 3 years ago

I am getting this error while accessing the link: "404 Page not found" Can you please confirm again from your end?

mrAndersen commented 3 years ago

https://file.io/eZlM8SFFnvU8 reuploaded

surajkharage19 commented 3 years ago

Thanks @mrAndersen for sharing the data. I am able to reproduce the issue at my end. Further, I observed that this is happening because - MAXDATALEN is defined in mysql_fdw.h to 65536 which is not sufficient for these long data. If I increase the MAXDATALEN value something like below then I am able to see those data in psql client.

#define MAXDATALEN 1024 * 128

Similar kind of issue reported earlier as well: https://github.com/EnterpriseDB/mysql_fdw/issues/181 https://github.com/EnterpriseDB/mysql_fdw/issues/111

If you have any test server where you are compiling the mysql_fdw sources then can you try increasing the MAXDATALEN at your end and see if that resolves the issue. Meanwhile, we are also checking on increasing the MAXDATALEN size (maybe dynamically).

mrAndersen commented 3 years ago

Increasing MAXDATALEN, did work, compiled on PGSQL 12.

This allocation code at mysql_bind_result - surely must be rewritten to dynamic, at least for reducing memory consumption, not speaking about that severe bug.

surajkharage19 commented 3 years ago

Thanks, @mrAndersen for confirming that increasing MAXDATALEN value resolves this issue. Sure, we will look into problematic code and try to make dynamic allocations.

NordenF commented 2 years ago

Hi! I am facing the same issue.

In my mysql table in one of the rows I have a very long field of the "mediumtext" type, and I cannot get even this single row using query by mysql_fdw.

I tried to cut a field within the query using mysql string functions like "left(mytextfield, 50)", but even that doesn't work, the query doesn't return data. If it worked, I could restore the field on the PostgreSQL side by concatenating pieces, but it doesn't work.

Is there any workaround for this problem other than rebuilding mysql_fdw from source?

surajkharage19 commented 2 years ago

Hi @NordenF,

We have fixed this issue under commit - 4b1b14bebfafb760d63f569a712f7f7589643f75 and the fix would be available in next release. For now, I don't see any workaround apart from rebuilding the source.

surajkharage19 commented 2 years ago

Hi @NordenF,

We have not received further updates from you on this. Hoping that issue is resolved. We are closing the case for now. If the issue persists, you can reopen the case anytime.