EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
530 stars 162 forks source link

Running in PL/pgSQL, WHERE clause push-down stops after 5 queries #118

Open jefft opened 7 years ago

jefft commented 7 years ago

Hi,

_TL;DR: mysqlfdw queries can become dramatically slower within PL/pgSQL FOR loops, due to the query optimizer eliminating the WHERE clause push-down.

I'm trying to import MySQL data into Postgres 9.5, using PL/pgSQL. The MySQL tables are large, and so the WHERE clause push-down is crucial -- otherwise select * from bigtable where id=123 effectively becomes select * from bigtable, which takes forever.

Unfortunately it seems the WHERE clause push-down is not happening after 5 iterations of a PL/pgSQL FOR loop.

For instance, say I have a MySQL table:

MariaDB [testdb]> select * from users ;
+------+--------+--------+
| id   | name   | photo  |
+------+--------+--------+
|    1 | Bob    | xxxx   |
|    2 | Fred   | yyyyy  |
|    3 | Sally  | zzzz   |
|    4 | Mary   | qzqzqz |
|    5 | Jimmy  | abcabc |
|    6 | User6  | 12345  |
|    7 | User7  | 12346  |
|    8 | User8  | 12347  |
|    9 | User9  | 12348  |
|   10 | User10 | 12349  |
+------+--------+--------+
10 rows in set (0.00 sec)

This is accessible from a Postgres database:

testdb=# \d
              List of relations
┌────────┬───────┬───────────────┬──────────┐
│ Schema │ Name  │     Type      │  Owner   │
├────────┼───────┼───────────────┼──────────┤
│ public │ users │ foreign table │ postgres │
└────────┴───────┴───────────────┴──────────┘
(1 row)

testdb=# select * from users;
INFO:  Successfully connected to MySQL database testdb at server 127.0.0.1 via TCP/IP with cipher <none> (server version: 5.5.5-10.0.27-MariaDB-0ubuntu0.16.04.1, protocol version: 10) 
┌────┬────────┬────────────────┐
│ id │  name  │     photo      │
├────┼────────┼────────────────┤
│  1 │ Bob    │ \x78787878     │
│  2 │ Fred   │ \x7979797979   │
│  3 │ Sally  │ \x7a7a7a7a     │
│  4 │ Mary   │ \x717a717a717a │
│  5 │ Jimmy  │ \x616263616263 │
│  6 │ User6  │ \x3132333435   │
│  7 │ User7  │ \x3132333436   │
│  8 │ User8  │ \x3132333437   │
│  9 │ User9  │ \x3132333438   │
│ 10 │ User10 │ \x3132333439   │
└────┴────────┴────────────────┘
(10 rows)

The WHERE clause push-down for plain SQL (not PL/pgSQL) queries work fine. For instance:

$ cat testqueries.sql 
SELECT name FROM users WHERE id=1;
SELECT name FROM users WHERE id=2;
SELECT name FROM users WHERE id=3;
SELECT name FROM users WHERE id=4;
SELECT name FROM users WHERE id=5;
SELECT name FROM users WHERE id=6;
SELECT name FROM users WHERE id=7;
SELECT name FROM users WHERE id=8;
SELECT name FROM users WHERE id=9;
SELECT name FROM users WHERE id=10;
$ psql -tAq testdb < testqueries.sql 
INFO:  Successfully connected to MySQL database testdb at server 127.0.0.1 via TCP/IP with cipher <none> (server version: 5.5.5-10.0.27-MariaDB-0ubuntu0.16.04.1, protocol version: 10) 
Bob
Fred
Sally
Mary
Jimmy
User6
User7
User8
User9
User10

I can tell the WHERE clause is passing through by watching /var/log/mysql/mysql.log:

94 Prepare  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
94 Execute  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
 ...
94 Prepare  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
94 Execute  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
...
94 Prepare  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
94 Execute  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
...
94 Prepare  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 10))
94 Execute  SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 10))

But say I run an equivalent loop in PL/pgSQL:

$ cat slowloop.sql 
create or replace function test() returns void as $$
DECLARE
    n varchar;
BEGIN
        FOR x IN 1..10 LOOP
                select name into n from users where id=x;
                raise notice 'Found user %', n;
        end loop;
        return;
END
$$ LANGUAGE plpgsql;
SELECT test();
$ psql -tAq testdb < slowloop.sql 
INFO:  Successfully connected to MySQL database testdb at server 127.0.0.1 via TCP/IP with cipher <none> (server version: 5.5.5-10.0.27-MariaDB-0ubuntu0.16.04.1, protocol version: 10) 
CONTEXT:  SQL statement "select name        from users where id=x"
PL/pgSQL function test() line 6 at SQL statement
NOTICE:  Found user Bob
NOTICE:  Found user Fred
NOTICE:  Found user Sally
NOTICE:  Found user Mary
NOTICE:  Found user Jimmy
NOTICE:  Found user User6
NOTICE:  Found user User7
NOTICE:  Found user User8
NOTICE:  Found user User9
NOTICE:  Found user User10

The /var/log/mysql/mysql.log queries are (in full):

       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
       95 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
       95 Close stmt    
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
       95 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
       95 Close stmt    
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
       95 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
       95 Close stmt    
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 4))
       95 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 4))
       95 Close stmt    
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 5))
       95 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 5))
       95 Close stmt    
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `id`, `name` FROM `testdb`.`users`
       95 Execute   SELECT `id`, `name` FROM `testdb`.`users`
       95 Close stmt    
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `id`, `name` FROM `testdb`.`users`
       95 Execute   SELECT `id`, `name` FROM `testdb`.`users`
       95 Close stmt    
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `id`, `name` FROM `testdb`.`users`
       95 Execute   SELECT `id`, `name` FROM `testdb`.`users`
       95 Close stmt    
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `id`, `name` FROM `testdb`.`users`
       95 Execute   SELECT `id`, `name` FROM `testdb`.`users`
       95 Close stmt    
       95 Query SET time_zone = '+00:00'
       95 Query SET sql_mode='ANSI_QUOTES'
       95 Prepare   SELECT `id`, `name` FROM `testdb`.`users`
       95 Execute   SELECT `id`, `name` FROM `testdb`.`users`
       95 Close stmt    
       95 Quit  

Notice that the WHERE clause disappears after 5 queries.

Searching the web, I found other references to PL/pgSQL slowing down after 5 queries:

https://www.postgresql.org/message-id/CALgmQD9ebXKPgbK5jjbaJ9kCWtOywjjpv-X9+eN0hn5gfGJOjQ@mail.gmail.com

I don't quite understand the replies - something about query plan optimizers - but wrapping the select in an EXECUTE does indeed make it fast again:

$ cat fastloop.sql 
create or replace function test() returns void as $$
DECLARE
    n varchar;
BEGIN
        FOR x IN 1..10 LOOP
                EXECUTE 'select name from users where id=$1' INTO n USING x;
                raise notice 'Found user %', n;
        end loop;
        return;
END
$$ LANGUAGE plpgsql;
SELECT test();

If this is a query optimizer problem, can anything be done at the mysql_fdw layer to tell the optimizer to stop messing with it? I've tried use_remote_estimate 'true' but it made no difference.

Thanks, Jeff

PS: the script to create the MySQL testdb database:

$ cat testdb_mysql.sql 
DROP DATABASE if exists testdb ;
CREATE DATABASE testdb;
USE testdb;

CREATE TABLE users (
  id integer,
  name text,
  photo blob
);

INSERT INTO `users` VALUES (1,'Bob','xxxx');
INSERT INTO `users` VALUES (2,'Fred','yyyyy');
INSERT INTO `users` VALUES (3,'Sally','zzzz');
INSERT INTO `users` VALUES (4,'Mary','qzqzqz');
INSERT INTO `users` VALUES (5,'Jimmy','abcabc');
INSERT INTO `users` VALUES (6,'User6','12345');
INSERT INTO `users` VALUES (7,'User7','12346');
INSERT INTO `users` VALUES (8,'User8','12347');
INSERT INTO `users` VALUES (9,'User9','12348');
INSERT INTO `users` VALUES (10,'User10','12349');

and to create a linked Postgres testdb database:

CREATE DATABASE testdb;
GRANT usage ON SCHEMA PUBLIC TO jturner;
\c testdb;
CREATE EXTENSION mysql_fdw;
CREATE SERVER mysql_server FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host '127.0.0.1', port '3306', use_remote_estimate 'true');
CREATE USER MAPPING FOR postgres SERVER mysql_server OPTIONS (username 'root', password 'secret');
CREATE USER MAPPING FOR jturner SERVER mysql_server OPTIONS (username 'root', password 'secret');
GRANT USAGE ON FOREIGN SERVER mysql_server to jturner;
IMPORT FOREIGN SCHEMA testdb FROM SERVER mysql_server INTO public;
select * from users;
ahsanhadi commented 7 years ago

Hi Jeff,

Thanks for sharing the test cases, the where clause should get pushed down for every iteration. We will try and investigate this issue asap and revert back on this thread...

-- Ahsan

waTeim commented 7 years ago

Let me add to this. I have noted the same behavior on Postgres 9.6. It is not restricted to for loops, but simple invocations also cause this. For instance, the following function returns quickly 5 times, but on the 6th time and subsequently runs slowly.

Example 1

CREATE OR REPLACE function ess.get_mysql_max_prid(_min_prid bigint) RETURNS bigint AS $$
DECLARE
  max_prid bigint;
BEGIN
  IF _min_prid IS NOT NULL THEN
    RAISE NOTICE 'prid is %',_min_prid;
    max_prid := max(prid) FROM mysql_extservice_all WHERE prid >= _min_prid;
  ELSE
    RAISE NOTICE 'min_prid is null';
    SELECT max(prid) FROM mysql_extservice_all INTO max_prid;
  END IF;
  RETURN max_prid;
END;
$$ language 'plpgsql';

Push Down works

Immediately after creation, the following all return in less that 1 second

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
 get_mysql_max_prid 
--------------------
            6348494
(1 row)

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
 get_mysql_max_prid 
--------------------
            6348495
(1 row)

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
 get_mysql_max_prid 
--------------------
            6348498
(1 row)

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
 get_mysql_max_prid 
--------------------
            6348500
(1 row)

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
 get_mysql_max_prid 
--------------------
            6348509
(1 row)

Problematic

Eventually, returns, but takes approximately 40 seconds. All subsequent invocations also take this long. Re-defining the stored procedure resets the behavior to working 5 times quickly.

ess=> select * from ess.get_mysql_max_prid(6328025);
NOTICE:  prid is 6328025
--------------------
            6348513
(1 row)
jmealo commented 7 years ago

@waTeim @jefft Giant kudos for figuring out this behavior! 🎆

I just ended up using materialized views and only issuing simple SELECT statements with mysql_fdw because it was so buggy with pushing quals down causing queries to sporadically fail (JOINs were very problematic).

This is a huge issue and needs to be fixed. I pity the fool using mysql_fdw in production without catching this in development!

ahsanhadi commented 7 years ago

This is an important issue, we will try and fix this on priority...

On Sun, Jan 8, 2017 at 4:30 AM, Jeffrey notifications@github.com wrote:

@waTeim https://github.com/waTeim @jefft https://github.com/jefft Giant kudos for figuring out this behavior! 🎆

I just ended up using MATERIALIZED VIEWS and only issuing simple SELECT statements with mysql_fdw because it was so buggy with pushing quals down causing queries to sporadically fail.

This is a huge issue and needs to be fixed. I pity the fool using mysql_fdw in production without catching this in development!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/EnterpriseDB/mysql_fdw/issues/118#issuecomment-271117676, or mute the thread https://github.com/notifications/unsubscribe-auth/AHmqpLtKg_H5E9u7hxDCA_HX9_jxIdgfks5rQCAfgaJpZM4KEly9 .

-- Ahsan Hadi Snr Director Product Development EnterpriseDB Corporation The Enterprise Postgres Company

Phone: +92-51-8358874 Mobile: +92-333-5162114

Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

ahsanhadi commented 7 years ago

Hi, We are finally getting around to looking at this issue. It seems that the planner stops pushing down the where clause after a certain number of iterations. The same behaviour is reproducible with postgres_fdw using a postgres foreign server. We will continue to look at this problem and keep your posted...

ibrarahmad commented 7 years ago

@jefft Push down support after 5 calls is supported now, can you please check.

jmealo commented 7 years ago

@jefft: Can you check or close this issue? I reviewed the fix and it looks like the test cases cover the reported issue.

jefft commented 7 years ago

@ibrarahmad, @jmealo, using latest from git, instead of slow queries I now get outright breakage after the fifth query:

postgres@jturner-desktop:~$ psql -tAq testdb < slowloop.sql 
NOTICE:  Found user Bob
NOTICE:  Found user Fred
NOTICE:  Found user Sally
NOTICE:  Found user Mary
NOTICE:  Found user Jimmy
ERROR:  failed to execute the MySQL query: 
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '?))' at line 1
CONTEXT:  SQL statement "select name        from users where id=x"
PL/pgSQL function test() line 6 at SQL statement
postgres@jturner-desktop:~$ 

In mysql.log I see:

170504 11:41:45     2 Connect   root@localhost as anonymous on testdb
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
            2 Query SET time_zone = '+00:00'
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
            2 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 1))
            2 Close stmt    
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
            2 Query SET time_zone = '+00:00'
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
            2 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 2))
            2 Close stmt    
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
            2 Query SET time_zone = '+00:00'
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
            2 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 3))
            2 Close stmt    
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 4))
            2 Query SET time_zone = '+00:00'
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 4))
            2 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 4))
            2 Close stmt    
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 5))
            2 Query SET time_zone = '+00:00'
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Prepare   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 5))
            2 Execute   SELECT `name` FROM `testdb`.`users` WHERE ((`id` = 5))
            2 Close stmt    
            2 Query SET sql_mode='ANSI_QUOTES'
            2 Query EXPLAIN SELECT `name` FROM `testdb`.`users` WHERE ((`id` = ?))
            2 Quit  

Here is the full series of commands for replicating, tested in a clean Ubuntu 16.04.2 (full console output in transcript.txt):

apt update
apt install -y postgresql-9.5 mariadb-server mariadb-client libmysqlclient-dev postgresql-server-dev-9.5 make gcc
perl -i -pe 's/^#general_log/general_log/g' /etc/mysql/mariadb.conf.d/50-server.cnf; systemctl restart mysql  # turn on mysql.log

# Change 'root' auth from socket to password so it works from psql. 
echo "grant all privileges on *.* to 'root'@'localhost' identified by 'secret'" | mysql
cat <<'EOF' > ~/.my.cnf
[mysql]
user=root
password=secret
EOF

mkdir -p ~/src/github.com/EnterpriseDB
git clone https://github.com/EnterpriseDB/mysql_fdw.git ~/src/github.com/EnterpriseDB/mysql_fdw    # At commit 86ad3af
cd src/github.com/EnterpriseDB/mysql_fdw/
sudo make USE_PGXS=1 install

cat <<'EOF' > /tmp/testdb_mysql.sql
DROP DATABASE if exists testdb ;
CREATE DATABASE testdb;
USE testdb;

CREATE TABLE users (
  id integer,
  name text,
  photo blob
);

INSERT INTO `users` VALUES (1,'Bob','xxxx');
INSERT INTO `users` VALUES (2,'Fred','yyyyy');
INSERT INTO `users` VALUES (3,'Sally','zzzz');
INSERT INTO `users` VALUES (4,'Mary','qzqzqz');
INSERT INTO `users` VALUES (5,'Jimmy','abcabc');
INSERT INTO `users` VALUES (6,'User6','12345');
INSERT INTO `users` VALUES (7,'User7','12346');
INSERT INTO `users` VALUES (8,'User8','12347');
INSERT INTO `users` VALUES (9,'User9','12348');
INSERT INTO `users` VALUES (10,'User10','12349');
EOF

cat <<'EOF' > /tmp/testdb_psql.sql
CREATE DATABASE testdb;
GRANT usage ON SCHEMA PUBLIC TO postgres;
\c testdb;
CREATE EXTENSION mysql_fdw;
CREATE SERVER mysql_server FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host '127.0.0.1', port '3306', use_remote_estimate 'true');
CREATE USER MAPPING FOR postgres SERVER mysql_server OPTIONS (username 'root', password 'secret');
GRANT USAGE ON FOREIGN SERVER mysql_server to postgres;
IMPORT FOREIGN SCHEMA testdb FROM SERVER mysql_server INTO public;
select * from users;
EOF
su - postgres -c psql < /tmp/testdb_psql.sql

cat <<'EOF' > /tmp/slowloop.sql
create or replace function test() returns void as $$
DECLARE
    n varchar;
BEGIN
        FOR x IN 1..10 LOOP
                select name into n from users where id=x;
                raise notice 'Found user %', n;
        end loop;
        return;
END
$$ LANGUAGE plpgsql;
SELECT test();
EOF
su - postgres -c 'psql testdb' < /tmp/slowloop.sql
MondayLiu commented 3 years ago

Hi @ahsanhadi @jefft , Has this issue been fixed? I'm the fool using mysql_fdw in production without catching this issue in development!

I use postgres12.6 & mysql_fdw 2.5.5

This is my table definition: database diagram

--PG
CREATE TABLE DDTINTO_MASTER  (
  AC_DATE   int         NOT NULL DEFAULT 0,
  AC_NO   varchar(12)   NOT NULL DEFAULT '',
  CCY    varchar(3)    NOT NULL DEFAULT '',
  QUALIFIER  char(1)       NOT NULL DEFAULT '',
  DEP    decimal(16, 2)      NOT NULL DEFAULT 0.00
  ....
  TS TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT    CURRENT_TIMESTAMP
) PARTITION BY RANGE (AC_DATE); 

CREATE TABLE ddtinto_201909 PARTITION OF DDTINTO_MASTER FOR VALUES FROM (10000101) TO (20191001);
CREATE TABLE ddtinto_201910 PARTITION OF DDTINTO_MASTER FOR VALUES FROM (20191001) TO (20191101);
...

CREATE INDEX ddtinto_pk on DDTINTO_MASTER  (ac_no, ccy, qualifier, ac_date);

CREATE VIEW DDTINTO AS
SELECT * FROM DDTINTO_CURRENT
UNION ALL
SELECT * FROM DDTINTO_MASTER;

--MySQL
create view view_ddtinto_current
as
select * from ddtinto
where    ac_date >= 20210401;

Sometimes , FDW can push all where clauses to MySQL, sometimes can only push part of the where caluses. Here is the latter's explain plan, Remote query lack 2 where clauses: ac_no and ccy

2021-06-04 00:22:10.311 CST [2695] LOG:  duration: 928091.882 ms  execute S_8: SELECT    TRIM(AC_NO) AC_NO
                    , TRIM(CCY)   CCY
                    , AC_DATE
                    , DEP
                    , QUALIFIER
            FROM      DDTINTO
            WHERE     QUALIFIER IN ('A', 'C')
                and AC_NO = $1
                and CCY = $2
                and AC_DATE >= $3
                and AC_DATE <= $4
            ORDER BY   AC_DATE DESC
            limit $5
2021-06-04 00:22:10.311 CST [2695] DETAIL:  parameters: $1 = '889005309160', $2 = 'HKD', $3 = '20210306', $4 = '20210604', $5 = '180'
2021-06-04 00:22:10.312 CST [2695] LOG:  duration: 928091.872 ms  plan:
    Query Text: SELECT    TRIM(AC_NO) AC_NO
                    , TRIM(CCY)   CCY
                    , AC_DATE
                    , DEP
                    , QUALIFIER
            FROM      DDTINTO
            WHERE     QUALIFIER IN ('A', 'C')
                and AC_NO = $1
                and CCY = $2
                and AC_DATE >= $3
                and AC_DATE <= $4
            ORDER BY   AC_DATE DESC
            limit $5
    Limit  (cost=1273.68..1273.94 rows=102 width=94) (actual time=928091.796..928091.825 rows=88 loops=1)
      Output: (btrim((ddtinto_current.ac_no)::text)), (btrim((ddtinto_current.ccy)::text)), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
      Buffers: shared hit=64
      ->  Sort  (cost=1273.68..1276.23 rows=1020 width=94) (actual time=928091.794..928091.806 rows=88 loops=1)
            Output: (btrim((ddtinto_current.ac_no)::text)), (btrim((ddtinto_current.ccy)::text)), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
            Sort Key: ddtinto_current.ac_date DESC
            Sort Method: quicksort  Memory: 31kB
            Buffers: shared hit=64
            ->  Result  (cost=25.00..1222.71 rows=1020 width=94) (actual time=231239.514..928091.626 rows=88 loops=1)
                  Output: btrim((ddtinto_current.ac_no)::text), btrim((ddtinto_current.ccy)::text), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
                  Buffers: shared hit=64
                  ->  Append  (cost=25.00..1207.41 rows=1020 width=87) (actual time=231239.509..928091.454 rows=88 loops=1)
                        Buffers: shared hit=64
                        Subplans Removed: 18
                        ->  Foreign Scan on public.ddtinto_current  (cost=25.00..1025.00 rows=1000 width=88) (actual time=231239.507..928091.234 rows=33 loops=1)
                              Output: ddtinto_current.ac_no, ddtinto_current.ccy, ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
                              Filter: (((ddtinto_current.ac_no)::text = ($1)::text) AND ((ddtinto_current.ccy)::text = ($2)::text))
                              Rows Removed by Filter: 34667775
                              Remote server startup cost: 25
                              Remote query: SELECT `ac_date`, `ac_no`, `ccy`, `qualifier`, `dep` FROM `ocbs387`.`view_ddtinto_current` WHERE (`qualifier`  IN ('A', 'C')) AND ((`ac_date` >= ?)) AND ((`ac_date` <= ?))
                        ->  Index Scan using ddtinto_202103_ac_no_ccy_qualifier_ac_date_idx on public.ddtinto_202103  (cost=0.56..9.10 rows=1 width=26) (actual time=0.051..0.089 rows=25 loops=1)
                              Output: ddtinto_202103.ac_no, ddtinto_202103.ccy, ddtinto_202103.ac_date, ddtinto_202103.dep, ddtinto_202103.qualifier
                              Index Cond: (((ddtinto_202103.ac_no)::text = ($1)::text) AND ((ddtinto_202103.ccy)::text = ($2)::text) AND (ddtinto_202103.ac_date >= $3) AND (ddtinto_202103.ac_date <= $4))
                              Filter: (ddtinto_202103.qualifier = ANY ('{A,C}'::bpchar[]))
                              Buffers: shared hit=29
                        ->  Index Scan using ddtinto_202104_ac_no_ccy_qualifier_ac_date_idx on public.ddtinto_202104  (cost=0.56..9.08 rows=1 width=26) (actual time=0.022..0.063 rows=30 loops=1)
                              Output: ddtinto_202104.ac_no, ddtinto_202104.ccy, ddtinto_202104.ac_date, ddtinto_202104.dep, ddtinto_202104.qualifier
                              Index Cond: (((ddtinto_202104.ac_no)::text = ($1)::text) AND ((ddtinto_202104.ccy)::text = ($2)::text) AND (ddtinto_202104.ac_date >= $3) AND (ddtinto_202104.ac_date <= $4))
                              Filter: (ddtinto_202104.qualifier = ANY ('{A,C}'::bpchar[]))
                              Rows Removed by Filter: 1
                              Buffers: shared hit=35
surajkharage19 commented 3 years ago

Hi @MondayLiu,

The original issue reported above seems to be fixed however, I suspect some other issue by looking at your test case. I could not reproduce this exact issue at my end. It would be good if you share exact DDLs from PG and MySQL with sample data from PG and MySQL to reproduce the issue.

Moreover, can you please confirm on below points:

Thanks and Regards, Suraj Kharage

MondayLiu commented 3 years ago

Hi @surajkharage19 , Thanks for response! After I reproduce in development! I'll new another issue. So far, only the following replies can be made:

  1. The probability of 97/1020 leads to partial where clause pushdown.
    Here is complete where clause pushdown:

    2021-06-04 00:06:28.015 CST [2617] LOG:  duration: 741.972 ms  plan:
    Query Text: SELECT    TRIM(AC_NO) AC_NO
                    , TRIM(CCY)   CCY
                    , AC_DATE
                    , DEP
                    , QUALIFIER
            FROM      DDTINTO
            WHERE     QUALIFIER IN ('A', 'C')
                and AC_NO = $1
                and CCY = $2
                and AC_DATE >= $3
                and AC_DATE <= $4
            ORDER BY   AC_DATE DESC
            limit $5
    Limit  (cost=1366.36..1366.81 rows=180 width=93) (actual time=741.891..741.921 rows=89 loops=1)
      Output: (btrim((ddtinto_current.ac_no)::text)), (btrim((ddtinto_current.ccy)::text)), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
      Buffers: shared hit=22 read=43 dirtied=38
      ->  Sort  (cost=1366.36..1369.02 rows=1065 width=93) (actual time=741.889..741.902 rows=89 loops=1)
            Output: (btrim((ddtinto_current.ac_no)::text)), (btrim((ddtinto_current.ccy)::text)), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
            Sort Key: ddtinto_current.ac_date DESC
            Sort Method: quicksort  Memory: 31kB
            Buffers: shared hit=22 read=43 dirtied=38
            ->  Result  (cost=25.00..1321.14 rows=1065 width=93) (actual time=45.109..741.696 rows=89 loops=1)
                  Output: btrim((ddtinto_current.ac_no)::text), btrim((ddtinto_current.ccy)::text), ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
                  Buffers: shared hit=22 read=43 dirtied=38
                  ->  Append  (cost=25.00..1305.17 rows=1065 width=84) (actual time=45.104..741.483 rows=89 loops=1)
                        Buffers: shared hit=22 read=43 dirtied=38
                        ->  Foreign Scan on public.ddtinto_current  (cost=25.00..1025.00 rows=1000 width=88) (actual time=45.102..45.129 rows=33 loops=1)
                              Output: ddtinto_current.ac_no, ddtinto_current.ccy, ddtinto_current.ac_date, ddtinto_current.dep, ddtinto_current.qualifier
                              Remote server startup cost: 25
                              Remote query: SELECT `ac_date`, `ac_no`, `ccy`, `qualifier`, `dep` FROM `ocbs387`.`view_ddtinto_current` WHERE (`qualifier`  IN ('A', 'C')) AND ((`ac_date` >= 20210306)) AND ((`ac_date` <= 20210604)) AND ((`ac_no` = '883005072877')) AND ((`ccy` = 'HKD'))
                        ->  Index Scan using ddtinto_202103_ac_no_ccy_qualifier_ac_date_idx on public.ddtinto_202103  (cost=0.56..129.39 rows=31 width=26) (actual time=8.999..356.128 rows=26 loops=1)
                              Output: ddtinto_202103.ac_no, ddtinto_202103.ccy, ddtinto_202103.ac_date, ddtinto_202103.dep, ddtinto_202103.qualifier
                              Index Cond: (((ddtinto_202103.ac_no)::text = '883005072877'::text) AND ((ddtinto_202103.ccy)::text = 'HKD'::text) AND (ddtinto_202103.ac_date >= 20210306) AND (ddtinto_202103.ac_date <= 20210604))
                              Filter: (ddtinto_202103.qualifier = ANY ('{A,C}'::bpchar[]))
                              Buffers: shared hit=5 read=25 dirtied=23
                        ->  Index Scan using ddtinto_202104_ac_no_ccy_qualifier_ac_date_idx on public.ddtinto_202104  (cost=0.56..145.46 rows=34 width=26) (actual time=30.783..340.156 rows=30 loops=1)
                              Output: ddtinto_202104.ac_no, ddtinto_202104.ccy, ddtinto_202104.ac_date, ddtinto_202104.dep, ddtinto_202104.qualifier
                              Index Cond: (((ddtinto_202104.ac_no)::text = '883005072877'::text) AND ((ddtinto_202104.ccy)::text = 'HKD'::text) AND (ddtinto_202104.ac_date >= 20210306) AND (ddtinto_202104.ac_date <= 20210604))
                              Filter: (ddtinto_202104.qualifier = ANY ('{A,C}'::bpchar[]))
                              Rows Removed by Filter: 1
                              Buffers: shared hit=17 read=18 dirtied=15
  2. 500 milliseconds SQL.

    2021-06-04 00:02:40.887 CST [2581] LOG:  duration: 258338.636 ms
    2021-06-04 00:06:28.015 CST [2581] LOG:  duration: 741.972 ms
    2021-06-04 00:06:41.625 CST [2581] LOG:  duration: 502.321 ms
    2021-06-04 00:07:11.666 CST [2581] LOG:  duration: 527.322 ms
    2021-06-04 00:07:21.340 CST [2564] LOG:  duration: 233966.558 ms
    2021-06-04 00:08:24.679 CST [2620] LOG:  duration: 592.067 ms
    2021-06-04 00:08:24.679 CST [2946] LOG:  duration: 525.514 ms
    2021-06-04 00:08:45.064 CST [2618] LOG:  duration: 500.426 ms
    2021-06-04 00:08:47.226 CST [2620] LOG:  duration: 533.847 ms
    2021-06-04 00:08:48.200 CST [2618] LOG:  duration: 847.036 ms
    2021-06-04 00:09:05.815 CST [2946] LOG:  duration: 604.529 ms
    2021-06-04 00:09:05.815 CST [2945] LOG:  duration: 531.357 ms
    2021-06-04 00:09:13.763 CST [2947] LOG:  duration: 511.709 ms
    2021-06-04 00:09:17.470 CST [2620] LOG:  duration: 583.923 ms
    2021-06-04 00:09:17.470 CST [2947] LOG:  duration: 527.431 ms
    2021-06-04 00:09:59.000 CST [2621] LOG:  duration: 514.290 ms
    2021-06-04 00:10:03.128 CST [2946] LOG:  duration: 544.570 ms
    2021-06-04 00:10:44.414 CST [2635] LOG:  duration: 515.128 ms 
    2021-06-04 00:10:51.638 CST [2636] LOG:  duration: 530.766 ms
    2021-06-04 00:11:05.345 CST [2636] LOG:  duration: 689.052 ms
    2021-06-04 00:11:05.345 CST [2996] LOG:  duration: 621.094 ms
    2021-06-04 00:11:27.960 CST [6838] LOG:  duration: 550.326 ms
    2021-06-04 00:11:44.759 CST [6908] LOG:  duration: 547.756 ms
    2021-06-04 00:11:49.326 CST [6871] LOG:  duration: 836.809 ms
    2021-06-04 00:11:49.326 CST [2983] LOG:  duration: 787.258 ms
    2021-06-04 00:11:49.792 CST [6908] LOG:  duration: 670.352 ms
    2021-06-04 00:11:49.793 CST [6838] LOG:  duration: 601.234 ms
    2021-06-04 00:12:07.626 CST [2949] LOG:  duration: 1065.761 ms
    2021-06-04 00:12:51.691 CST [6908] LOG:  duration: 575.303 ms
    2021-06-04 00:13:22.716 CST [2996] LOG:  duration: 831.634 ms
    2021-06-04 00:13:25.306 CST [6944] LOG:  duration: 549.983 ms
    2021-06-04 00:13:34.876 CST [6944] LOG:  duration: 941.696 ms
    ...
    ...
    2021-06-04 00:22:10.311 CST [2695] LOG:  duration: 928091.882 ms
    2021-06-04 00:23:04.557 CST [7526] LOG:  duration: 507.980 ms 
    2021-06-04 00:23:16.398 CST [7562] LOG:  duration: 525.549 ms
    2021-06-04 00:23:16.523 CST [7510] LOG:  duration: 592.123 ms
    2021-06-04 00:23:23.602 CST [7527] LOG:  duration: 505.949 ms
    2021-06-04 00:23:31.135 CST [7510] LOG:  duration: 671.336 ms 
    2021-06-04 00:23:31.136 CST [7578] LOG:  duration: 594.377 ms
    2021-06-04 00:23:42.087 CST [7649] LOG:  duration: 904.054 ms
    2021-06-04 00:23:49.156 CST [7563] LOG:  duration: 527.979 ms
    2021-06-04 00:23:50.266 CST [2580] LOG:  duration: 1013959.478 ms
    2021-06-04 00:24:24.573 CST [7526] LOG:  duration: 645.630 ms
    2021-06-04 00:24:24.573 CST [7563] LOG:  duration: 711.659 ms
    2021-06-04 00:24:34.395 CST [7563] LOG:  duration: 637.527 ms
    2021-06-04 00:24:34.396 CST [7611] LOG:  duration: 536.465 ms
    2021-06-04 00:24:36.734 CST [7650] LOG:  duration: 678.372 ms
    2021-06-04 00:24:47.554 CST [7611] LOG:  duration: 813.259 ms
    2021-06-04 00:24:47.554 CST [7735] LOG:  duration: 733.719 ms
    2021-06-04 00:25:06.778 CST [7649] LOG:  duration: 529.749 ms
    2021-06-04 00:25:06.779 CST [7701] LOG:  duration: 503.374 ms
    2021-06-04 00:25:24.519 CST [7650] LOG:  duration: 540.833 ms
    2021-06-04 00:25:24.519 CST [7649] LOG:  duration: 614.708 ms
    2021-06-04 00:25:30.435 CST [7650] LOG:  duration: 503.289 ms
    2021-06-04 00:25:32.233 CST [7578] LOG:  duration: 593.404 ms
    2021-06-04 00:25:32.233 CST [7701] LOG:  duration: 530.180 ms
    2021-06-04 00:25:44.033 CST [7578] LOG:  duration: 831.246 ms
    2021-06-04 00:25:44.033 CST [7735] LOG:  duration: 910.863 ms
    2021-06-04 00:26:00.344 CST [7735] LOG:  duration: 656.323 ms
    2021-06-04 00:26:00.344 CST [7824] LOG:  duration: 604.716 ms
    2021-06-04 00:26:03.642 CST [7702] LOG:  duration: 572.389 ms
    2021-06-04 00:26:03.642 CST [7858] LOG:  duration: 634.322 ms
    2021-06-04 00:26:06.218 CST [7824] LOG:  duration: 515.326 ms
    2021-06-04 00:26:10.056 CST [7701] LOG:  duration: 503.993 ms
    2021-06-04 00:26:14.506 CST [7701] LOG:  duration: 536.509 ms
    2021-06-04 00:26:46.010 CST [7858] LOG:  duration: 541.532 ms
    2021-06-04 00:26:57.437 CST [7824] LOG:  duration: 569.747 ms
    2021-06-04 00:26:57.934 CST [2942] LOG:  duration: 1186707.760 ms
    2021-06-04 00:27:01.335 CST [7751] LOG:  duration: 1141.671 ms
    2021-06-04 00:27:01.335 CST [7702] LOG:  duration: 886.833 ms
    2021-06-04 00:27:01.335 CST [7751] LOG:  duration: 1141.649 ms
    2021-06-04 00:27:12.155 CST [2564] LOG:  duration: 1178133.430 ms
    2021-06-04 00:27:12.636 CST [7736] LOG:  duration: 509.778 ms
    2021-06-04 00:27:12.636 CST [7860] LOG:  duration: 613.868 ms
    ...
    ...
  3. PG holds historical data and MySQL holds current data.There are 80 million records in total. Two-thirds are in PG.

  4. All 5 where caluse in SQL are required.

Thanks and Regards, Monday Liu

surajkharage19 commented 3 years ago

Thanks for sharing the information.

This issue seems occurring randomly, not following any pattern. There might be chances that the planner is doing some optimization based on parameter values. I would appreciate it if you could provide the exact test case to reproduce this issue and that will help us with further troubleshooting. Also, are you observing the same behavior when you execute the above query (with exact values) on psql?

MondayLiu commented 3 years ago

Hi, @surajkharage19. I start a issue.

Monday