apache / arrow-go

Official Go implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
4 stars 3 forks source link

[Go] Improved building of structs into arrow record #64

Open gmintoco opened 1 year ago

gmintoco commented 1 year ago

Describe the enhancement requested

Hi,

I recently made a post on the mailing list but I thought this might make more sense as a location to communicate. I am using Arrow for Go mostly to read and write Parquet and IPC files. Often I would like to use the very helpful schema.NewSchemaFromStruct() from github.com/apache/arrow/go/v11/parquet/schema. However naturally then in my code, I would like to build an Arrow record using this schema, something like this:

        var obj []Test
        pool := memory.NewGoAllocator()

    parquetSchema, err := pqschema.NewSchemaFromStruct(Test{})
    if err != nil {
        return nil, nil, err
    }
    schema, err := pqarrow.FromParquet(parquetSchema, &pqarrow.ArrowReadProperties{}, metadata.KeyValueMetadata{})
    if err != nil {
        return nil, nil, err
    }
    pqschema.PrintSchema(parquetSchema.Root(), os.Stdout, 2)

    builder := array.NewRecordBuilder(pool, schema)
    defer builder.Release()

    for i, obj := range input {
        builder.Field(0).(*array.BinaryBuilder).Append([]byte(obj.Id))
        list := builder.Field(1).(*array.ListBuilder)
        for _, value := range obj.Values[i] {
            subList := list.ValueBuilder().(*array.ListBuilder)
            subList.ValueBuilder().(*array.Float64Builder).Append(value)
            subList.Append(true)
        }
        list.Append(true)
    }

    rec := builder.NewRecord()

This is fine for smaller structs but when they get larger or a lot more complicated it is very tedious writing out all of the builder code (if there is already a better way of doing this I would love to know! or if I am approaching this wrong, I am quite new to go :) )

I thought it would make sense to have some reflection-based builder that can build a record from a struct. I took a stab at implementing something like this here: https://gist.github.com/gmintoco/3e65aa7b47ae37b0685db88b2755933f

My questions are:

  1. Is there a better way of doing this?
  2. Does a function like this make sense to add to the Go arrow implementation (I would be happy to try and write a PR if this is the case)

Looking forward to any feedback :)

Component(s)

Go

zeroshade commented 1 year ago

I think this is a fine idea and would be great to see expanded into a full PR