databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.88k stars 751 forks source link

Feature: TransformCastSchema need check load_can_auto_cast_to during construction #16790

Open sundy-li opened 2 weeks ago

sundy-li commented 2 weeks ago

Summary

Description for this feature.

youngsofun commented 2 weeks ago

load_can_auto_cast_to is much more strict, only used for load, not suitable for here.

we may introduce another help function like https://github.com/databendlabs/databend/pull/16072 this closed pr was intended to be used in copy only too (replaced by load_can_auto_cast_to later)

but @andylokandy said

็†่ฎบไธŠ ไปปไฝ•็ฑปๅž‹้ƒฝๅฏไปฅไบ’่ฝฌ
ๆ‰€ไปฅไน‹ๅ‰ๆฒกๆœ‰ can_cast_to ๏ผŒcast ไธ่ฟ‡ๅŽปๆ˜ฏๅ€ผ็š„้—ฎ้ข˜ไธๆ˜ฏ็ฑปๅž‹้—ฎ้ข˜

I think since currently our cast (run_cast) do report error on some (src, dst) pairs. may be it is ok to add a can_cast_to which is consistent with run_cast to report error earlier when building pipeline, and check it when constructing TransformCastSchema

sundy-li commented 2 weeks ago

But run_cast will check the cast types and throw

  _ => Err(ErrorCode::BadArguments(format!(
                "unable to cast type `{src_type}` to type `{dest_type}`"
            ))
            .set_span(span)),
andylokandy commented 2 weeks ago

We can say that a value is not able to be cast to another type, e.g. CAST('a' AS INT), but we can not say that all String can not be cast to Int. The former will throw at evaluation, and the latter will throw at type checking.

sundy-li commented 2 weeks ago

The former will throw at evaluation, and the latter will throw at type checking.

The latter will not throw errors at type checking, let's add it.

It will always return Expr::Cast

fn check_cast<Index: ColumnIndex> ... {

 Ok(Expr::Cast {
            span,
            is_try,
            expr: Box::new(expr),
            dest_type: wrapped_dest_type,
        })
}
sundy-li commented 2 weeks ago

A simple case, the error is thrown during sql pipeline execution.

๐Ÿณ :) create table a (t decimal(15,2));
processed in (0.018 sec)

๐Ÿณ :) insert into a select * from b;
error: APIError: ResponseError with 1006: fail to auto cast column t (Variant NULL) to column t (Decimal(15, 2) NULL)
unable to cast type `Variant` to type `Decimal(15, 2)`, during run expr: `CAST(t AS Decimal(15, 2) NULL)`
andylokandy commented 2 weeks ago

The former will throw at evaluation, and the latter will throw at type checking.

I mean, if we agree that all types can convert between each other, i.e. We can say that a value is not able to be cast to another type, e.g. CAST('a' AS INT), but we can not say that all String can not be cast to Int.๏ผŒ we should not throw at type checking.