Open joao-p-pereira opened 2 months ago
I can do this one
Seems like it is not a bug here, we could directly pass the quote to Format struct and get the correct answer suppose we have a csv file like
and writing a test like
fn test_with_quote() {
let mut file =
File::open("/Users/yxiang1/work/arrow-rs/arrow-csv/test/data/quote.csv").unwrap();
let (schema, _) = Format::default().infer_schema(&mut file, None).unwrap();
println!("did not pass quote schema is {:?}", schema);
let mut file =
File::open("test/data/quote.csv").unwrap();
let (schema, _) = Format::default()
.with_quote(b'\'')
.infer_schema(&mut file, None)
.unwrap();
println!("after pass schema is {:?}", schema);
}
we could pass the single quote to Format and get different results like
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When using ListingOptions to infer the schema of a ListingTableUrl the result schema does not take into account the quote defined in the format. This will make all the schema columns that have the quote present to be inferred as utf8.
Describe the solution you'd like
Infer_schema should take into account the file format quote when inferring the schemas, so the inferred type can be the more specific possible.
Describe alternatives you've considered
Additional context