Open agchang opened 4 months ago
Hi @agchang, thanks for opening this issue. I think this feature very well may make sense to implement, and we would welcome your contribution if you decide to do so!
I'll write down a few of my thoughts because something like this will generally involve some tradeoffs:
Unlike in CSV where changing the number of columns between rows is invalid, JSON allows changes to the "schema" element-by-element. This can mean adding/removing a field between rows or even having entirely disjoint sets of fields.
null
, fields that are added can be ignored.*FromJSON()
functions.If we want to go with the latter approach, my recommendation would be to focus on a dedicated implementation of the "first-pass" which infers an Arrow schema from JSON. We can then just use the output of this function as input to the existing ones:
func InferSchemaFromJSON(r io.Reader) (*arrow.Schema, error) { ... } // This needs to be implemented
func main() {
jsonBlob := `{ ... }`
schema, err := InferSchemaFromJSON(strings.NewReader(jsonBlob))
if err != nil {
log.Fatal(err)
}
table, err := TableFromJSON(memory.DefaultAllocator, schema, []string{jsonBlob})
if err != nil {
log.Fatal(err)
}
// do table stuff
}
@agchang I made bodkin to address the schema generation issue.
Describe the enhancement requested
I am interested in support for schema inference in the
RecordFromJSON
andTableFromJSON
functions, as these currently require anarrow.Schema
up front. I can try to contribute this if people think it makes sense. I noticed for CSV, there is NewInferringReader which just assumes the type of the first row.Component(s)
Go