apla / node-clickhouse

Yandex ClickHouse driver for nodejs
MIT License
217 stars 50 forks source link

Importing TSV to ClickHouse return ERROR: garbage after Nullable(Int64) #60

Closed calebeaires closed 4 years ago

calebeaires commented 4 years ago

This code process a simple TSV and get broken when instead of String data type it is used Int.

the create table schema

 CREATE TABLE bucket_8.migrar
              (customerNumber Nullable(Int16),customerName Nullable(String),contactLastName Nullable(String),contactFirstName Nullable(String),contactName Nullable(String),phone Nullable(String),addressLine1 Nullable(String),addressLine2 Nullable(String),address Nullable(String),city Nullable(String),state Nullable(String),postalCode Nullable(String),country Nullable(String),salesRepEmployeeNumber Nullable(Int16),creditLimit Nullable(Decimal(12,4)),casa Nullable(Int64)) ENGINE = MergeTree
              ORDER BY tuple()

code to do the magic


 private async bringToClickHouse(taskData, destination) {
        const tsvStream = createReadStream(destination);
        const clickhouseStream = click.query(
            `INSERT INTO default.customers` {
                    format: 'TSV',
                    queryOptions: {
                        input_format_tsv_empty_as_default: 1,
                        input_format_null_as_default: 1,
                        input_format_skip_unknown_fields: 1,
                    },
                });

        tsvStream.pipe(clickhouseStream);

        return new Promise((resolve, reject) => {
            tsvStream.on('error', err => {
                reject(err);
            });
            tsvStream.on('finish', res => {
                resolve(res);
            });
        });
    }

the TSV

103 Atelier graphique   Schmitt Carine  Carine  Schmitt 40.32.2555  54, rue Royale  NULL    NULL    Nantes  NULL    44000   France  1370    21000.00    NULL
112 Signal Gift Stores  King    Jean    Jean King   7025551838  8489 Strong St. NULL    NULL    Las Vegas   NV  83030   USA 1166    71800.00    NULL
114 Australian Collectors, Co.  Ferguson    Peter   Peter Ferguson  03 9520 4555    636 St Kilda Road   Level 3 636 St Kilda Road - Level 3 Melbourne   Victoria    3004    Australia   1611    117300.00   NULL
119 La Rochelle Gifts   Labrune Janine  Janine  Labrune 40.67.8555  67, rue des Cinquante Otages    NULL    NULL    Nantes  NULL    44000   France  1370    118200.00   NULL
121 Baane Mini Imports  Bergulfsen  Jonas   Jonas  Bergulfsen   07-98 9555  Erling Skakkes gate 78  NULL    NULL    Stavern NULL    4110    Norway  1504    81700.00    NULL
124 Mini Gifts Distributors Ltd.    Nelson  Susan   Susan Nelson    4155551450  5677 Strong St. NULL    NULL    San Rafael  CA  97562   USA 1165    210500.00   NULL
125 Havel & Zbyszek Co  Piestrzeniewicz Zbyszek     Zbyszek  Piestrzeniewicz    (26) 642-7555   ul. Filtrowa 68 NULL    NULL    Warszawa    NULL    01-012  Poland  NULL    0.00    NULL
128 Blauer See Auto, Co.    Keitel  Roland  Roland Keitel   +49 69 66 90 2555   Lyonerstr. 34   NULL    NULL    Frankfurt   NULL    60528   Germany 1504    59700.00    NULL

the helpful but not so loved error

  var fields = new Error (e.toString ('utf8'));
              ^
Error: Cannot parse input: expected \n before: NULL\n112\tSignal Gift Stores\tKing\tJean\tJean King\t7025551838\t8489 Strong St.\tNULL\tNULL\tLas Vegas\tNV\t83030\tUSA\t1166\t71800.00\tNULL\n114\tAustralian Collectors, Co.\tFe: (at row 1)Row 1:Column 0,   name: customerNumber,         type: Nullable(Int16),          parsed text: "103"Column 1,   name: customerName,           type: Nullable(String),         parsed text: "Atelier graphique"Column 2,   name: contactLastName,        type: Nullable(String),         parsed text: "Schmitt"Column 3,   name: contactFirstName,       type: Nullable(String),         parsed text: "Carine "Column 4,   name: contactName,            type: Nullable(String),         parsed text: "Carine  Schmitt"Column 5,   name: phone,                  type: Nullable(String),         parsed text: "40.32.2555"Column 6,   name: addressLine1,           type: Nullable(String),         parsed text: "54, rue Royale"Column 7,   name: addressLine2,           type: Nullable(String),         parsed text: "NULL"Column 8,   name: address,                type: Nullable(String),         parsed text: "NULL"Column 9,   name: city,                   type: Nullable(String),         parsed text: "Nantes"Column 10,  name: state,                  type: Nullable(String),         parsed text: "NULL"Column 11,  name: postalCode,             type: Nullable(String),         parsed text: "44000"Column 12,  name: country,                type: Nullable(String),         parsed text: "France"Column 13,  name: salesRepEmployeeNumber, type: Nullable(Int16),          parsed text: "1370"Column 14,  name: creditLimit,            type: Nullable(Decimal(12, 4)), parsed text: "21000.00"Column 15,  name: casa,                   type: Nullable(Int64),          parsed text: <EMPTY>ERROR: garbage after Nullable(Int64): "NULL<LINE FEED>112<TAB>S" (version 20.3.7.46 (official build))
nezed commented 4 years ago

This error is produced by ClickHouse itself. Looks like you have different number of columns in your table (17) and in your data (16). Also you're using wrong NULL representation which must be replaced with \N. (Something like Null character).

I suggest to:

You can also insert you data using native http interface to be sure that your data and query is valid

cat data.tsv | curl -v \
    http://localhost:8123/ \
    --data-urlencode 'query=INSERT INTO "table" FORMAT TabSeparated' \
    --data-binary @-
nezed commented 4 years ago

Here is the fix for command above. Sorry for wrong hint.

cat data.tsv | \
  cat <(echo 'INSERT INTO "table" FORMAT TabSeparated') - | \
  curl -v http://localhost:8123/ --data-binary @-
calebeaires commented 4 years ago

It works. thanks