Open matasello opened 5 months ago
@domoritz
I've run into the
json2parquet uses arrow::json::reader::infer_json_schema_from_seekable
It doesn't look like arrow-rs Arrow-json
collect_field_types_from_object` does any kind of timestamp inference at all.
https://github.com/apache/arrow-rs/blob/master/arrow-json/src/reader/schema.rs#L88
The arrow-rs arrow-json has a low-level decoder that seems has some kind of support for coercing types to timestamp however I'm not sure how that would work and whether enabling timestamp detection to schema inference would need to be done in json2parquet or in arrow-rs. https://github.com/apache/arrow-rs/blob/master/arrow-json/src/reader/mod.rs
Hmm, thanks for looking into this. I won't have time to look into this deeply anytime soon but I'd be more than happy to review a pull request.
Not sure I am doing this right, but I am trying to convert a CSV containing some timestamp to a parquet file.
Sample CSV
csv2parquet --header false --schema-file mt_status.json /dev/stdin mt_status.parquet
│ mt_status.parquet │ ts │ INT64 │ │ REQUIRED │ │ │ │ │ │ │
Any hint ? Thanks