EnterpriseDB / mysql_fdw

PostgreSQL foreign data wrapper for MySQL
Other
531 stars 162 forks source link

Strange problems with Nested Loop Left Join #102

Open ledstellar opened 8 years ago

ledstellar commented 8 years ago

We use mysql_fdw built from latest trunk with PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 5.3.0, 64 bit and MariaDB Server 10.1.13-MariaDB. All are on the same server Arch Linux 4.4.5-1-ARCH.

I ended up with two almost the same queries. Here is the first one:

SELECT  rp.id, l.time, rp.date_create, rp.id_y_catalog
FROM replica.reference_price rp
LEFT JOIN o_log.reference_price_log l ON l.row=rp.id
    AND l.column='y_id' AND l.value=rp.id_y_catalog::text
WHERE 
--  rp.date_create BETWEEN '2016-01-01' AND '2016-06-30 23:59:59' AND
    rp.id in (518419,519906,519960,519921,519922,520535)

In this query _o_log.reference_pricelog is an external mysql_fdw table. The query returns the following result:

id time date_create id_y_catalog
518419 2016-01-20 05:07:14 2016-01-15 15:49:01+06 13107473
519906 2016-02-24 19:41:33 2016-01-29 12:07:39+06 13107611
519921 2016-02-24 19:53:21 2016-01-29 12:42:52+06 13078968
519922 2016-02-24 20:05:40 2016-01-29 12:44:26+06 13078730
519960 2016-02-24 20:10:53 2016-01-29 13:35:06+06 13093686
520535 2016-02-03 12:43:57 2016-02-03 12:43:32+06 13079112

And when uncommented condition, i.e.:

SELECT  rp.id, l.time, rp.date_create, rp.id_y_catalog
FROM replica.reference_price rp
LEFT JOIN o_log.reference_price_log l ON l.row=rp.id
    AND l.column='y_id' AND l.value=rp.id_y_catalog::TEXT
WHERE
    rp.date_create BETWEEN '2016-01-01 00:00:00' AND '2016-06-30 12:42:52+06' AND
    rp.id in (518419,519906,519960,519921,519922,520535)

it suddenly returns another result:

id time date_create id_y_catalog
518419 2016-01-20 05:07:14 2016-01-15 15:49:01+06 13107473
519906 2016-01-29 12:07:39+06 13107611
519921 2016-01-29 12:42:52+06 13078968
519922 2016-01-29 12:44:26+06 13078730
519960 2016-01-29 13:35:06+06 13093686
520535 2016-02-03 12:43:32+06 13079112

As you can note condition shouldn't change anything as all records meet condition requirements. However in the latter query only first record has been selected from left-joined table _o_log.reference_pricelog.

Here are query plans. The first one:

Hash Right Join  (cost=58.09..1060.09 rows=6 width=24) (actual time=1441.265..1449.603 rows=6 loops=1)
  Hash Cond: ((l."row" = rp.id) AND (l.value = (rp.id_y_catalog)::text))
  ->  Foreign Scan on reference_price_log l  (cost=10.00..1010.00 rows=1000 width=44) (actual time=0.149..1330.263 rows=608015 loops=1)
  ->  Hash  (cost=48.07..48.07 rows=6 width=16) (actual time=0.132..0.132 rows=6 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Index Scan using reference_price_pkey on reference_price rp  (cost=0.08..48.07 rows=6 width=16) (actual time=0.055..0.114 rows=6 loops=1)
              Index Cond: (id = ANY ('{518419,519906,519960,519921,519922,520535}'::integer[]))
Planning time: 0.362 ms
Execution time: 4909.458 ms

and for the second query:

Nested Loop Left Join  (cost=10.08..1063.07 rows=1 width=24) (actual time=1353.848..1362.032 rows=6 loops=1)
  Join Filter: ((l."row" = rp.id) AND (l.value = (rp.id_y_catalog)::text))
  Rows Removed by Join Filter: 608014
  ->  Index Scan using reference_price_pkey on reference_price rp  (cost=0.08..48.07 rows=1 width=16) (actual time=0.044..0.125 rows=6 loops=1)
        Index Cond: (id = ANY ('{518419,519906,519960,519921,519922,520535}'::integer[]))
        Filter: ((date_create >= '2016-01-01 00:00:00+06'::timestamp with time zone) AND (date_create <= '2016-06-30 12:42:52+06'::timestamp with time zone))
  ->  Foreign Scan on reference_price_log l  (cost=10.00..1010.00 rows=1000 width=44) (actual time=0.018..211.585 rows=101336 loops=6)
Planning time: 0.457 ms
Execution time: 5191.408 ms

Note that Hash Right Join was changed to Nested Loop Left Join. And I found that's the only difference between these two queries.

I also tried to move the whole table from MySQL to local PostgreSQL. The result of the query using that moved table was exactly the same as the first one in this example.

ahsanhadi commented 8 years ago

I would like a clarify a few points form your bug report

1- You said that you were getting the same result after moving the mysql table to pg...so in this case the FDW wasn't being used and you were simple joining two PG tables?

2- Can you please demonstrate the behaviour with just using PG tables not using the mysql_fdw

3- You are using MariaDB for this test case not MySQL DB?

ledstellar commented 8 years ago

First of all thank you for the quick response. (1) Yes, exactly. I did it mostly to assure that there's not a problem of my PostgreSQL installation itself. (2) Here is that query plan:

Hash Right Join  (cost=48.08..136160.77 rows=1 width=24) (actual time=1323.659..1376.439 rows=6 loops=1)
  Hash Cond: ((l."row" = rp.id) AND (l.value = (rp.id_y_catalog)::text))
  ->  Seq Scan on reference_price_log l  (cost=0.00..134906.32 rows=603185 width=32) (actual time=0.054..1287.278 rows=608015 loops=1)
        Filter: (("column")::text = 'yandex_id'::text)
        Rows Removed by Filter: 9259512
  ->  Hash  (cost=48.07..48.07 rows=1 width=16) (actual time=0.074..0.074 rows=6 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Index Scan using reference_price_pkey on reference_price rp  (cost=0.08..48.07 rows=1 width=16) (actual time=0.026..0.065 rows=6 loops=1)
              Index Cond: (id = ANY ('{518419,519906,519960,519921,519922,520535}'::integer[]))
              Filter: ((date_create >= '2016-01-01 00:00:00+06'::timestamp with time zone) AND (date_create <= '2016-06-30 12:42:52+06'::timestamp with time zone))
Planning time: 0.382 ms
Execution time: 1376.514 ms

(3) Yes, we using MariaDB. Unfortunately we haven't any MySQL installation here.

tpetry commented 8 years ago

Have got a quite the same problem, but can't share the query (inhouse application). The filtering of records produces strange results, very often rows are filtered away which perfectly match the condition, as in the example of @ledstellar. It's even the same behaviour that only ONE row is kept in the result set.

I even noticed sometimes that an equality check on integer (some_integer_column = value) will fail, but converting it to a range scan (some_integer_column > value - 1 AND some_integer_column < value + 1) with the additions and subtractions not done in sql but hardcoded in the query returns the correct results.

There's definitely a problem with foreign data wrapper row filtering.

munro commented 8 years ago

I'm experiencing the same issue with joining a 2+ MySQL tables to a local table.

select * from public.meow m
join mysql_foreign_table.foo f on f.id = m.foo_id /* joins all rows correctly */
join mysql_foreign_table.bar b on b.id = f.bar_id /* bug--only joins one row */
where m.meow = TRUE

If I switch the last join to a left join, it shows the exact same behavior has @ledstellar where only one row is joined.

Further strangeness, this actually does sometimes work. Here's a query that has a good query plan [1], and then when I change just one filter in the where, it joins incorrectly with a bad query plan [2]. It looks like the issue is when it does a Join Filter.

I found a quick fix by setting set enable_nestloop = off;, it makes forces the queries to always do a Hash Join, avoiding the buggy Join Filter.

[1] good https://explain.depesz.com/s/KZlx [2] bad https://explain.depesz.com/s/5JF