Percona-Lab / clickhousedb_fdw

PostgreSQL's Foreign Data Wrapper For ClickHouse
Other
201 stars 24 forks source link

Inconsistent results in count queries #20

Open eclbg opened 5 years ago

eclbg commented 5 years ago

The following two queries yield different results while I understand they should be equivalent. The second one yields the correct result.

SELECT COUNT(1) FROM (SELECT DISTINCT report_id FROM clickhouse_reportusages) t;
 count
-------
  2000
SELECT COUNT(DISTINCT report_id) FROM clickhouse_reportusages;
 count
-------
  2956

Their respective plans are:

EXPLAIN VERBOSE SELECT COUNT(1) FROM (SELECT DISTINCT report_id FROM clickhouse_reportusages) t;
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Aggregate  (cost=1.51..1.52 rows=1 width=8)
   Output: count(1)
   ->  Unique  (cost=1.00..1.50 rows=1 width=4)
         Output: clickhouse_reportusages.report_id
         ->  Foreign Scan on public.clickhouse_reportusages  (cost=1.00..-1.00 rows=1000 width=4)
               Output: clickhouse_reportusages.report_id
               Remote SQL: SELECT report_id FROM "default".reportusages ORDER BY report_id ASC
(7 rows)
EXPLAIN VERBOSE SELECT COUNT(DISTINCT report_id) FROM clickhouse_reportusages;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Foreign Scan  (cost=1.00..-1.00 rows=1000 width=8)
   Output: (count(DISTINCT report_id))
   Relations: Aggregate on (clickhouse_reportusages)
   Remote SQL: SELECT count(DISTINCT report_id) FROM "default".reportusages
(4 rows)
alanrigele commented 5 years ago

Similar experiences have been seen with the latest code.

select count(node) from calculated_kpis provides different results for each query even though the underlying data is not changed.

select count(*) from (select node from calculated_kpis) a provides the correct result