Percona-Lab / clickhousedb_fdw

PostgreSQL's Foreign Data Wrapper For ClickHouse
Other
201 stars 24 forks source link

Different results for same postgres and clickhouse queries #21

Open josefinaestevez opened 5 years ago

josefinaestevez commented 5 years ago

I've noticed that some queries shows different results in postgres and clickhouse when, if I'm not wrong, they should be the same.

A simple example:

Postgres

SELECT COUNT(DISTINCT relation_id) FROM clickhouse_table WHERE relation_id IN (SELECT relation_id FROM clickhouse_table WHERE date_from > '2019-02-01');
 count 
-------
   116
(1 row)

Clickhouse

SELECT COUNT(DISTINCT relation_id) FROM table WHERE relation_id IN (SELECT relation_id FROM table WHERE date_from > '2019-02-01');
┌─uniqExact(report_id)─┐
│                  677 │
└──────────────────────┘

Same inconsistency if I remove the WHERE clause. Do you have any idea what could be happening? Thanks a lot!

ibrarahmad commented 5 years ago

Let me take a look at that

On Mon, Jul 15, 2019 at 5:35 PM Josefina Estevez notifications@github.com wrote:

Reopened #21 https://github.com/Percona-Lab/clickhousedb_fdw/issues/21.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Percona-Lab/clickhousedb_fdw/issues/21?email_source=notifications&email_token=AA3NF4S4SROTQSINZBHNPG3P7RVJPA5CNFSM4IBQNFMKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSP5ZM2A#event-2482738792, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3NF4RRYJTZX3AJHDN64XDP7RVJPANCNFSM4IBQNFMA .

-- Ibrar Ahmed

josefinaestevez commented 5 years ago

Thanks for taking a look! Another example:

Postgres

SELECT DISTINCT(country_id), COUNT(1) as cant FROM clickhouse_table GROUP BY country_id ORDER BY cant;
 country_id |  cant   
------------+---------
          7 | 1742709
          6 | 1755293
(2 rows)

Clickhouse

SELECT DISTINCT(country_id), COUNT(1) as cant FROM table GROUP BY country_id ORDER BY cant;
┌─country_id─┬────cant─┐
│         26 │   70000 │
│          6 │ 5933100 │
│          7 │ 5933100 │
└────────────┴─────────┘
ibrarahmad commented 5 years ago

The tuples do not store in an ordered way in the database. When we are applying the "order by clause" in PostgreSQL so it is possible to have a different result. Please restrict your query with WHERE clause.

On Wed, Jul 17, 2019 at 4:34 PM Josefina Estevez notifications@github.com wrote:

Thanks for taking a look! Another example: Postgres

SELECT DISTINCT(country_id), COUNT(1) as cant FROM clickhouse_table GROUP BY country_id ORDER BY cant;

country_id | cant ------------+---------

      7 | 1742709

      6 | 1755293

(2 rows)

Clickhouse

SELECT DISTINCT(country_id), COUNT(1) as cant FROM table GROUP BY country_id ORDER BY cant;

┌─country_id─┬────cant─┐

│ 26 │ 70000 │

│ 6 │ 5933100 │

│ 7 │ 5933100 │

└────────────┴─────────┘

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Percona-Lab/clickhousedb_fdw/issues/21?email_source=notifications&email_token=AA3NF4U5ENW7H47GFLZL2C3P737V7A5CNFSM4IBQNFMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2D42BY#issuecomment-512216327, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3NF4X55U4JIN4UGQHVJQDP737V7ANCNFSM4IBQNFMA .

-- Ibrar Ahmed

Kale-Grabovski commented 4 years ago

Hello. I've got the same issue seems. As mentioned, the results are different at pg and clickhouse.

CREATE FOREIGN TABLE dwh_loads(
  country_id int,
  ip text,
  created_at_date date
) SERVER clickhouse_svr;

pg=# select count(*), created_at_date
from dwh_loads 
where created_at_date > '2019-08-20' 
group by created_at_date;

  count  | created_at_date 
---------+-----------------
 2307442 | 2019-08-22
 4583210 | 2019-08-21
   30920 | 2019-08-23
   39461 | 2019-08-24
(4 rows)

pg=# select count(*), created_at_date 
from dwh_loads 
where created_at_date > '2019-08-10'
group by created_at_date;

  count  | created_at_date 
---------+-----------------
 3670016 | 2019-08-16
 1686864 | 2019-08-11
 1638122 | 2019-08-18
(3 rows)

# clickhouse
SELECT 
    count(*), 
    created_at_date
FROM dwh_loads
WHERE created_at_date > '2019-08-10'
GROUP BY created_at_date
ORDER BY created_at_date ASC

┌─count()─┬─created_at_date─┐
│ 1987751 │      2019-08-11 │
│ 2225018 │      2019-08-12 │
│ 4098923 │      2019-08-13 │
│ 7453111 │      2019-08-14 │
│ 6787449 │      2019-08-15 │
│ 6396884 │      2019-08-16 │
│ 5642157 │      2019-08-17 │
│ 5485166 │      2019-08-18 │
│ 4949855 │      2019-08-19 │
│ 4968256 │      2019-08-20 │
│ 4583210 │      2019-08-21 │
│ 4781908 │      2019-08-22 │
│ 4449216 │      2019-08-23 │
│ 4911274 │      2019-08-24 │
│ 1945331 │      2019-08-25 │
└─────────┴─────────┘

More of that, sometimes I see postgres down when trying to run one of above queries from psql:

server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.

and all my queries to db in about 1 minute end up with this error: pq: the database system is in recovery mode

ibrarahmad commented 4 years ago

On Sun, Aug 25, 2019 at 1:04 PM Kale notifications@github.com wrote:

Hello. I've got the same issue seems. As mentioned, the results are different at pg and clickhouse.

I am looking at the issue, specifically the crash.

CREATE FOREIGN TABLE dwh_loads(

country_id int,

ip text,

created_at_date date

) SERVER clickhouse_svr;

pg=# select count(*), created_at_date

from dwh_loads

where created_at_date > '2019-08-20'

group by created_at_date;

count | created_at_date

---------+-----------------

2307442 | 2019-08-22

4583210 | 2019-08-21

30920 | 2019-08-23

39461 | 2019-08-24

(4 rows)

pg=# select count(*), created_at_date

from dwh_loads

where created_at_date > '2019-08-10'

group by created_at_date;

count | created_at_date

---------+-----------------

3670016 | 2019-08-16

1686864 | 2019-08-11

1638122 | 2019-08-18

(3 rows)

clickhouse

SELECT

count(*),

created_at_date

FROM dwh_loads

WHERE created_at_date > '2019-08-10'

GROUP BY created_at_date

ORDER BY created_at_date ASC

┌─count()─┬─created_at_date─┐

│ 1987751 │ 2019-08-11 │

│ 2225018 │ 2019-08-12 │

│ 4098923 │ 2019-08-13 │

│ 7453111 │ 2019-08-14 │

│ 6787449 │ 2019-08-15 │

│ 6396884 │ 2019-08-16 │

│ 5642157 │ 2019-08-17 │

│ 5485166 │ 2019-08-18 │

│ 4949855 │ 2019-08-19 │

│ 4968256 │ 2019-08-20 │

│ 4583210 │ 2019-08-21 │

│ 4781908 │ 2019-08-22 │

│ 4449216 │ 2019-08-23 │

│ 4911274 │ 2019-08-24 │

│ 1945331 │ 2019-08-25 │

└─────────┴─────────┘

More of that, sometimes I see postgres down when trying to run one of above queries from psql:

server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Percona-Lab/clickhousedb_fdw/issues/21?email_source=notifications&email_token=AA3NF4TPVFNKWRKX3YTWCODQGI4KPA5CNFSM4IBQNFMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5COULQ#issuecomment-524610094, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3NF4U7CCQAE44WKKG3B3DQGI4KPANCNFSM4IBQNFMA .

-- Ibrar Ahmed