apache / arrow-java

Official Java implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
4 stars 5 forks source link

[Java] Write list of maps with the vector API #132

Open gszadovszky opened 10 months ago

gszadovszky commented 10 months ago

Describe the usage question you have. Please include as many useful details as possible.

I'd like to store list of maps. Not sure if I am doing it wrong or there is a bug in the implementation. Here is my example:

try (ListVector listVector = ListVector.empty("sourceVector", allocator)) {

      UnionListWriter listWriter = listVector.getWriter();

      listWriter.allocate();

      /*
       * [{"one" -> 1, "two" -> 2}, {"three" -> 3}, null, {"four" -> null, "five" -> 5}]
       */

      listWriter.setPosition(0);
      listWriter.startList();

      listWriter.map().startMap(); // <-- exception is thrown from here
      listWriter.map().startEntry();
      listWriter.map().key().varChar().writeVarChar("one");
      listWriter.map().key().integer().writeInt(1);
      listWriter.map().endEntry();
      listWriter.map().startEntry();
      listWriter.map().key().varChar().writeVarChar("two");
      listWriter.map().key().integer().writeInt(2);
      listWriter.map().endEntry();
      listWriter.map().endMap();

      listWriter.map().startMap();
      listWriter.map().startEntry();
      listWriter.map().key().varChar().writeVarChar("three");
      listWriter.map().key().integer().writeInt(3);
      listWriter.map().endEntry();
      listWriter.map().endMap();

      listWriter.writeNull();

      listWriter.map().startMap();
      listWriter.map().startEntry();
      listWriter.map().key().varChar().writeVarChar("four");
      listWriter.map().key().integer().writeNull();
      listWriter.map().endEntry();
      listWriter.map().startEntry();
      listWriter.map().key().varChar().writeVarChar("five");
      listWriter.map().key().integer().writeInt(5);
      listWriter.map().endEntry();
      listWriter.map().endMap();

      listWriter.endList();

When I execute this I'm getting the following exception:

java.lang.UnsupportedOperationException: Cannot get simple type for type MAP
    at org.apache.arrow.vector/org.apache.arrow.vector.types.Types$MinorType.getType(Types.java:807)
    at org.apache.arrow.vector/org.apache.arrow.vector.complex.impl.PromotableWriter.getWriter(PromotableWriter.java:275)
    at org.apache.arrow.vector/org.apache.arrow.vector.complex.impl.AbstractPromotableFieldWriter.getWriter(AbstractPromotableFieldWriter.java:80)
    at org.apache.arrow.vector/org.apache.arrow.vector.complex.impl.AbstractPromotableFieldWriter.startMap(AbstractPromotableFieldWriter.java:114)
    at org.apache.arrow.vector/org.apache.arrow.vector.complex.impl.PromotableWriter.startMap(PromotableWriter.java:53)

Component(s)

Java

davisusanibar commented 10 months ago

@gszadovszky, let me review this issue and come back with a few suggestions.

gszadovszky commented 10 months ago

Thanks @davisusanibar, in advance.

davisusanibar commented 10 months ago

@gszadovszky Would it be possible to use MapVector instead?

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.complex.MapVector;
import org.apache.arrow.vector.complex.impl.UnionMapWriter;

public class TestMeMapVector {

    public static void main(String[] args) {
        try (MapVector mapVector = MapVector.empty("map", new RootAllocator(), false)) {
            mapVector.allocateNew();

            UnionMapWriter mapWriter = mapVector.getWriter();
            mapWriter.allocate();

            mapWriter.startMap();

            mapWriter.startEntry();
            mapWriter.key().varChar().writeVarChar("one");
            mapWriter.value().integer().writeInt(1);
            mapWriter.endEntry();
            mapWriter.startEntry();
            mapWriter.key().varChar().writeVarChar("two");
            mapWriter.value().integer().writeInt(2);
            mapWriter.endEntry();
            mapWriter.startEntry();
            mapWriter.key().varChar().writeVarChar("three");
            mapWriter.value().integer().writeInt(3);
            mapWriter.endEntry();
            mapWriter.writeNull();
            mapWriter.startEntry();
            mapWriter.key().varChar().writeVarChar("four");
            mapWriter.value().integer().writeNull();
            mapWriter.endEntry();
            mapWriter.startEntry();
            mapWriter.key().varChar().writeVarChar("five");
            mapWriter.value().integer().writeInt(5);
            mapWriter.endEntry();

            mapWriter.endMap();

            mapWriter.setValueCount(1);

            System.out.println(mapVector);

            // [[{"key":"one","value":1},{"key":"two","value":2},{"key":"three","value":3},null,{"key":"four"},{"key":"five","value":5}]]
        }
    }
}
davisusanibar commented 10 months ago

In addition, I would like to continue reviewing if there are any changes that need to be made in order to allow a simple data type to be defined for a Map as an abstraction level of the writer's current state.

gszadovszky commented 10 months ago

Thanks a lot, @davisusanibar for the example and the further investigation!

It seems I oversimplified my example. What I wanted to write is a vector containing lists of maps. For example:

[{"one" -> 1, "two" -> 2}, {"three" -> 3}, null, {"four" -> null, "five" -> 5}],
[{"six" -> 6}],
[null],
null,
[{"seven" -> 7}, {}, {"eight" -> null, "nine" -> 9}],
[]
gszadovszky commented 10 months ago

Any updates on this, @davisusanibar? Do you think this is something should work or I am trying to do something unsupported?

davisusanibar commented 10 months ago

Thanks a lot, @davisusanibar for the example and the further investigation!

It seems I oversimplified my example. What I wanted to write is a vector containing lists of maps. For example:

[{"one" -> 1, "two" -> 2}, {"three" -> 3}, null, {"four" -> null, "five" -> 5}],
[{"six" -> 6}],
[null],
null,
[{"seven" -> 7}, {}, {"eight" -> null, "nine" -> 9}],
[]

Hi @lidavidm How do you feel about this kind of vector? Do you think it's unsupported?

lidavidm commented 10 months ago

I'm not familiar enough with the writer API to say whether you can do this off the top of my head. You can always build the vector by hand (or say, build the map vector via the writer, then manually wrap it in a list vector by constructing the offsets yourself).

gszadovszky commented 10 months ago

@lidavidm, thanks for your reply. I am building the vectors using the writers based on a Parquet schema. Everything works fine but this type of nested data. Since there is a UnionListWriter.map(), it seems it should be supported. There are even different implementations for the different overloaded versions or map.