Open youngsofun opened 1 day ago
cc @sundy-li @everpcpc @wubx @Xuanwo
choose Solution 2 after discuss with @sundy-li.
to make it more clear, I propose:
COLUMN_MATCH_MODE
COLUMN_MATCH_MODE:
CASE_SENSITIVE: Match columns by name, case-sensitive.
CASE_INSENSITIVE: Match columns by name, case-insensitive.
POSITION: Match columns by position instead of name.
FORMAT_DEFAULT: Use the default matching behavior based on file format.
FILE_FORMAT:
CSV: Default POSITION.
Parquet/ORC/NDJson: Default CASE_INSENSITIVE.
note nota all mode for all format are supported, we will do them one by one
select, infer_infer_schema add param column_name_to_lowercase=TRUE|FALSE
, default false
will remind user to use this option if there are related errors.
Summary
currently, when reading parquet file, the fields of file schema is modified that all field names are turned to lowercase.
Solution 1
parquet/ndjson add format option case_sensitive
cons:
Solution 2:
case_sensitive
, false by default (for compatible)cons:
pros