Open zml1206 opened 1 month ago
@zhztheplayer @rui-mo Is this a velox problem? Can it be solved by native or fallback the scan of column names containing Cyrillic?
Hi @zml1206, are you saying a full Cyrillic name can be read correctly while a mixed name cannot?
No, it’s just that some Cyrillic letters cannot be parsed, for example "Т"
Could you please check the written file's content to determine whether the problem is on read or write?
Confirmed problem that it is read.
Roman numerals are not correct either, for example col name 国Ⅵ
.
The non-ASCII characters are not well supported in Velox tokenizer. Opened https://github.com/facebookincubator/velox/issues/10796 to discuss its support. Thanks.
Backend
VL (Velox)
Bug description
enable gluten
disable gluten
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response