crate / crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
https://cratedb.com/product
Apache License 2.0
4.04k stars 554 forks source link

Can't import objects with COPY FROM from CSV-file #10961

Closed proddata closed 2 years ago

proddata commented 3 years ago

CrateDB version: 4.3.4

Environment description: CrateDB Cloud

Variants tested

obj
"{""x"":1}"
obj
"{"x":1}"
obj
"'{""x"":1}'"
obj
"{"x" = 1}"
obj
"{""x"" = 1}"

Problem description:

Can't import object columns with CSV-Import (COPY FROM)

Steps to reproduce:

  1. Create Table
    CREATE TABLE csv (obj OBJECT);
  2. Create CSV
    obj
    "{""x"":1}"
  3. Import from CSV
    COPY csv FROM '/home/user/test.csv' RETURN SUMMARY;

--> object mapping for [obj] tried to parse field [obj] as object, but found a concrete value

Workaround Import as text and have an auto-generated OBJECT-column

CREATE TABLE csv (txt TEXT, obj GENERATED ALWAYS AS txt::OBJECT);

Todo

mfussenegger commented 3 years ago

I think this relates a bit to https://github.com/crate/crate/issues/10186 in that currently COPY FROM is pretty much a "raw import" operation which doesn't attempt type casts.

jeeminso commented 2 years ago
$ cat /tmp/t.csv 
obj
"{""x"":1}"
"{""x"":2}"

cr> CREATE TABLE csv (obj OBJECT);
CREATE OK, 1 row affected  (0.182 sec)
cr> COPY csv FROM '/tmp/t.csv' RETURN SUMMARY;
+---------------------------------------------------------+-------------------+---------------+-------------+--------+
| node                                                    | uri               | success_count | error_count | errors |
+---------------------------------------------------------+-------------------+---------------+-------------+--------+
| {"id": "m02LrGfbSVimenoB06iAJw", "name": "Diedamskopf"} | file:///tmp/t.csv |             2 |           0 | {}     |
+---------------------------------------------------------+-------------------+---------------+-------------+--------+
COPY 1 row in set (0.036 sec)
cr> select * from csv;
+----------+
| obj      |
+----------+
| {"x": 2} |
| {"x": 1} |
+----------+
SELECT 2 rows in set (0.015 sec)

Verified working, closing.

Hello @proddata, I just have a question on the variants in the description. Other than the first one, they all fail, could I get more details on them?

proddata commented 2 years ago

@jeeminso That was mostly me trying different variants. The default escaping character for PostgreSQL is a double quote:

from https://www.postgresql.org/docs/14/sql-copy.html ESCAPE - Specifies the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). This must be a single one-byte character. This option is allowed only when using CSV format.

So this might be another feature request to support the QUOTE and ESCAPE options

I just wonder if

obj
{""x"":1}

or

obj
{"x":1}

should also work?

jeeminso commented 2 years ago

@proddata Thank you. Looks like only {"x":1} works.

$ cat /tmp/t.csv 
obj
{"x":1}
{""x"":2}

+---------------------------------------------------------+-------------------+---------------+-------------+-------------------------------------------------------------------------------------------+
| node                                                    | uri               | success_count | error_count | errors                                                                                    |
+---------------------------------------------------------+-------------------+---------------+-------------+-------------------------------------------------------------------------------------------+
| {"id": "m02LrGfbSVimenoB06iAJw", "name": "Diedamskopf"} | file:///tmp/t.csv |             1 |           1 | {"Cannot cast value `{\"\"x\"\":2}` to type `object`": {"count": 1, "line_numbers": [2]}} |
+---------------------------------------------------------+-------------------+---------------+-------------+-------------------------------------------------------------------------------------------+
COPY 1 row in set (0.021 sec)
cr> COPY csv FROM '/tmp/t.csv' RETURN SUMMARY;