Open taiyang-li opened 15 hours ago
Reason: orc requires offsets[i+1] == offset[i]
when the i-th row is null in MapVectorBatch
, otherwise the written orc batch is not consistent with CH column. So is ListVectorBatch
. But in CH, the nullable map column returned from function str_to_map
maybe like:
| nullmap | offsets | keys |
|---------|---------|------|
| 1 | 2 | [k1, k2] |
| 1 | 4 | [k3, k4] |
| 0 | 6 | [k5, k6] |
| 1 | 8 | [k7, k8] |
Solution: Recursively truncate non-empty nested data when current row is null in CH Map column before writing to ORC/Parquet.
Description
First execute below sql with native write enabled.
Then download the orc file and view its content. We can see that the values of
mic_time
andlabel_map['mic_time']
don't match.