EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
537 stars 162 forks source link

Incomplete data transfer when pulling over a large table #111

Open munro opened 8 years ago

munro commented 8 years ago
create table foo as select * from mysql_remote.foo;

Ran this, and it returned successfully, but the pull was incomplete. Pulled down 148,8082 rows from the original 7,958,064 table. Just for context, this table stores message data around ~3 KiB in each column. MySQL says the table is 32.6 GiB (not sure if that's with indexes).

Here's the PG table size— screen shot 2016-06-23 at 10 43 06 am

ahsanhadi commented 8 years ago

Can you please share your test case including your DDL and perhaps with some sample data that can help us reproduce the issue? Can you also do a select count(*) from both remote and local table after you have run the following statement.

create table foo as select * from mysql_remote.foo;

munro commented 8 years ago

@ahsanhadi Sure! Yea it's really on me to make a reproducible failing test, isn't it? 😝 It may very well be just in the MySQL version I'm using

ahsanhadi commented 8 years ago

@munro Yes if you can share a sample test case, it will help narrow down and investigate the issue...

munro commented 8 years ago

Here's the structure of the MySQL table [1], it's running on AWS's RDS MySQL InnoDB 5.6.21. I know this still leaves you in the dark—my next steps to try & reproduce this are to make a table with lots of rows, with a longtext field containing lots of data (similar to the statistics in that screenshot), and see if mysql_fdw successfully pulls it all in. The pure amount of rows & data in the longtext are what I assume to be causing it to fail.

[1]

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created_on` datetime NOT NULL,
  `updated_on` datetime NOT NULL,
  `uuid` char(32) NOT NULL,
  `foo_type_id` int(11) NOT NULL,
  `from_email` varchar(255) NOT NULL,
  `subject` varchar(255) NOT NULL,
  `raw_email` varchar(200) NOT NULL,
  `plaintext` longtext NOT NULL,
  `status` int(11) NOT NULL,
  `test_group` varchar(25) DEFAULT NULL,
  `unique_id` varchar(255) NOT NULL,
  `from_name` varchar(70) NOT NULL,
  `group_id` int(11),
  `message_id` varchar(150),
  `campaign_id` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uuid` (`uuid`),
  UNIQUE KEY `message_id` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=11370857 DEFAULT CHARSET=utf8;
wwindcloud commented 7 years ago

I have this similar problem lately and find out that if a column in mysql contains a huge text, mysql_fdw just simply ignore that row and the rows follow it. After some tracing, the problem can be remedied by changing the MAXDATALEN in mysql_fdw.h to a larger value. Obviously, this is not a nice solution, I wander how this can be solved generally by dynamically increasing MAXDATALEN and check if MYSQL_DATA_TRUNCATED returned.

eMerzh commented 6 years ago

Yes got the same issue with mediumtext , where rows get forgotten. really annoying behavior, because its silent so i retrieved 6000 lines instead of 80 000. Any way to replace the drop row by an error or smth at least ?

lifeboy commented 5 years ago

Ah! thought I was alone with this problem, but it seems not :-)
wwindcloud, how were you able to trace this?

wwindcloud commented 5 years ago

Ah! thought I was alone with this problem, but it seems not :-) wwindcloud, how were you able to trace this?

Yeah, it's really annoying. I fixed it by increasing MAXDATALEN, not nice but workable. I dig up the source code, notice a comment somewhere in mysqlIterateForeignScan at mysql_fdw.c, it recognize MYSQL_DATA_TRUNCATED, but did not really handle it properly, it says it will report this error as MYSQL_REPORT_DATA_TRUNCATION in option, not sure how that works.

Hopefully, someone with knowledge of both mysql and postgresql library submit a pull request on this.