Open rad-pat opened 1 week ago
The ambiguous column error you're encountering when using the COPY INTO
command with case-sensitive columns (c1
and C1
) is likely due to the logic at line 301 in bind_context.rs
. This logic checks if there are multiple tables in the metadata that are not sources of an index, and if so, it raises a SemanticError
indicating that the table of the internal column is ambiguous:
if tables.len() > 1 {
return Err(ErrorCode::SemanticError(format!(
"The table of the internal column `{}` is ambiguous",
column_binding.internal_column.column_name()
)));
}
This error can occur if the column names are not being correctly distinguished based on case sensitivity during the binding process [1].
To address this issue, you may need to ensure that the normalize_identifier
function correctly handles case sensitivity for both quoted and unquoted identifiers. The normalize_identifier
function is used to normalize identifiers based on the case sensitivity settings defined in the NameResolutionContext
:
pub fn normalize_identifier(ident: &Identifier, context: &NameResolutionContext) -> Identifier {
if (ident.is_quoted() && context.quoted_ident_case_sensitive)
|| (!ident.is_quoted() && context.unquoted_ident_case_sensitive)
{
ident.clone()
} else {
Identifier::from_name(ident.span, ident.name.to_lowercase())
}
}
You should verify that the NameResolutionContext
is configured correctly for your use case, ensuring that quoted identifiers are treated as case-sensitive and unquoted identifiers are handled according to your requirements [2].
If the issue persists, you might need to review the logic in the COPY INTO
command's processing of Parquet files to ensure that it correctly handles case-sensitive column names. The ParquetCopySource
struct and its methods (event
, process
, and async_process
) are responsible for processing Parquet files during the COPY INTO
operation [3][4][5].
By ensuring that the case sensitivity settings and the identifier normalization logic are correctly implemented, you should be able to resolve the ambiguous column error.
To continue talking to Dosu, mention @dosu.
Search before asking
Version
v1.2.635-nightly
What's Wrong?
When trying to load data via COPY INTO from a parquet file with case sensitive columns, an ambiguous column error is reported. Looks to be from https://github.com/datafuselabs/databend/blob/2c41faf63b5136ab22a2ab3ac972c9752dc4a7bf/src/query/sql/src/planner/binder/bind_context.rs#L301
The parquet file contains columns such as
c1
andC1
which exist within the target table.How to Reproduce?
Sorry not easy for me to write a script to replicate, but easy to describe
c1
&C1
create table t_sensitive ("c1" int, "C1" int);
Are you willing to submit PR?