Infinidat / infi.clickhouse_fdw

A PostgreSQL foreign data wrapper for ClickHouse
BSD 3-Clause "New" or "Revised" License
54 stars 4 forks source link

Incomplete Query Results Returned #2

Open nsscan opened 6 years ago

nsscan commented 6 years ago

A foreign table in postgresql does not receive the full query result set from clickhouse when there are rows consisting of the empty string.

example table configured in clickhouse

describe tab1_ch;

DESCRIBE TABLE tab1_ch

┌─name──────────┬─type───┬─default_type─┬─default_expression─┐
│ insdate       │ Date   │ DEFAULT      │ toDate(now())      │
│ clickhouse_id │ Int64  │              │                    │
│ c             │ String │ DEFAULT      │ \'\'               │
│ a             │ String │              │                    │
│ d             │ String │              │                    │
│ D             │ String │              │                    │
└───────────────┴────────┴──────────────┴────────────────────┘

6 rows in set. Elapsed: 0.002 sec. 

:) select * from tab1_ch;

SELECT *
FROM tab1_ch 

┌────insdate─┬─clickhouse_id─┬─c─┬─a─┬─d─┬─D─┐
│ 2017-12-25 │            15 │   │ 1 │   │   │
└────────────┴───────────────┴───┴───┴───┴───┘
┌────insdate─┬─clickhouse_id─┬─c─┬─a─┬─d─┬─D─┐
│ 2017-12-25 │            16 │   │   │ 1 │   │
└────────────┴───────────────┴───┴───┴───┴───┘
┌────insdate─┬─clickhouse_id─┬─c─┬─a─┬─d─┬─D─┐
│ 2017-12-25 │            17 │   │   │   │ 1 │
└────────────┴───────────────┴───┴───┴───┴───┘

3 rows in set. Elapsed: 0.003 sec. 

Querying this table to select all the 'd' values

select d from tab1_ch;

SELECT d
FROM tab1_ch 

┌─d─┐
│   │
└───┘
┌─d─┐
│ 1 │
└───┘
┌─d─┐
│   │
└───┘

3 rows in set. Elapsed: 0.003 sec. 

This table is also configures in PostgreSQL as a foreign table

\d+ tab1_ch
                                   Foreign table "public.tab1_ch"
    Column     |       Type        | Modifiers | FDW Options | Storage  | Stats target | Description 
---------------+-------------------+-----------+-------------+----------+--------------+-------------
 insdate       | date              |           |             | plain    |              | 
 clickhouse_id | bigint            |           |             | plain    |              | 
 c             | character varying |           |             | extended |              | 
 a             | character varying |           |             | extended |              | 
 d             | character varying |           |             | extended |              | 
 D             | character varying |           |             | extended |              | 
Server: clickhouse_server
FDW Options: (db_name 'default', db_url 'http://localhost:8123/', table_name 'tab1_ch')

but now the same select returns an incomplete result set when run in PostgreSQL

select d from tab1_ch;
NOTICE:  SELECT `d`
FROM `tab1_ch`
WHERE 1
 d 
---
 1
(1 row)

The expected results are 3 rows, 2 with empty strings and 1 with value '1' - i.e. the same results as clickhouse returns.

only if we add the key column is the correct number of rows returned

select clickhouse_id,d from tab1_ch;
NOTICE:  SELECT `clickhouse_id`, `d`
FROM `tab1_ch`
WHERE 1
 clickhouse_id | d 
---------------+---
            15 | 
            16 | 1
            17 | 
(3 rows)

The Foreign Data Wrapper seems to be incorrectly translating and executing the query or incorrectly retrieving the result set from clickhouse by filtering result rows which are entirely empty strings.

The problem is identical is we use the other columns, 'c' or 'a'

select c from tab1_ch;
NOTICE:  SELECT `c`
FROM `tab1_ch`
WHERE 1
 c 
---
(0 rows)

select c is worse as it returns 0 rows rather than 3 rows with empty strings.

# select a from tab1_ch;
NOTICE:  SELECT `a`
FROM `tab1_ch`
WHERE 1
 a 
---
 1
(1 row)

If a 2nd column is added, the issue seems to go away

select c,"D" from tab1_ch where "D" <> '1';
NOTICE:  SELECT `c`, `D`
FROM `tab1_ch`
WHERE D != '1'
 c | D 
---+---
   | 
   | 
(2 rows)
ishirav commented 6 years ago

Good catch... It's actually a bug in the underlying infi.clickhouse_orm module. It ignores zero-length lines when parsing the results from ClickHouse. So if you select only a single string column, and the column has no value, you get back an empty line (which is ignored). If you select two columns, there's a tab character between their values so the line is no longer empty.