Closed youngsofun closed 1 year ago
cc @sundy-li @Xuanwo @BohuTANG
Maybe it's better to add a doc in RFC directory, it's better to comment.
Copy into table1 from stage1 with transform c1, c2 + 1
vs
Copy into table1 from (select c1, c2 + 1 from stage1)
I prefer the latter one.
Copy into table1 from stage1 with transform c1, c2 + 1
vs
Copy into table1 from (select c1, c2 + 1 from stage1)
I prefer the latter one.
no place for data schema
and harder to impl on_error
I prefer a verbose data schema
schema infer
has many limitations, is only convenient for play with data, not good for production.
or
Copy into table1 from (select c1, c2 + 1 from stage1 with schema (c1 int8, c2 string))
good point is it extends select from stage
too.
but not as friendly as with transform
to use on_error
directly , and check/record the files
I prefer a verbose data schema infer has many limitations, is only convenient for play with data, not good for production.
Not prefer to do that, because the REPLACE/MERGE data source may also use the transform statement
, it's complex for us, more nature to use select from stage
statement, and only support parquet is good enough for now.
MERGE INTO target_table FROM (SELECT c1, c1+1 from state) CASE WHEN ...
Summary
goal
solve https://github.com/datafuselabs/databend/issues/10173
on_error
inCOPY
propose
simple form (can only be used in copy, which can do schema infer):
transform
is used in ETL, easy to understand.standard form:
when we need schema:
schema infer
.schema infer
depend on the file chosen to be inferred, but data may be bad.$1
,$2
, which is disaster when there is a lot of columnsschema infer
that always safe isall columns string
. while with a schema, the string can be deserialized into a dest column,not only convenient, but also much more efficient(deserialize while read is faster than read into utf8strings and cast to some map), comparewith transform t.c1, t.c2 + 1 from t(c1 int8)
withwith transform t.c1::int8 from t(c1 int8)
that always safe is
all columns variant. even if we risk to infer, for a column with
objectas value, it can be a map or a variant, and
string` can map to many TYPE like CSV/TSV, e.g. "timestamp"/"decimal"...we can provide syntactic sugars for this
note:
ON_ERROR
is skip by file (not supported yet),expr
after transform can not contain aggr