apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.58k stars 3.54k forks source link

Java: What is the best way to concat two vectors? #35585

Open i10416 opened 1 year ago

i10416 commented 1 year ago

Describe the usage question you have. Please include as many useful details as possible.

Given two IntVectors one: {1, 2, 3} and another: {null, 5, 6}, how can I concat them into avector result: {1, 2, 3, 4, 5, 6}?

Should I copy element one by one using copyFromSafe or is there any convenient method to achieve the same result?

If I should copy element one by one, how can I calculate value count and null count of the result?

val one : IntVector = ??? // suppose {1, 2, 3,}
val another: IntVector = ??? // suppose {null, 5, 6}

val offset = one.getValueCount() +one.getNullCount() // 3
val len = another.getValueCount() + another.getNullCount() // 3
var i = 0
while i < len do
    one.copyFromSafe(i, offset + i, another) 
    // can I detect whether copied value is null here so that I can respectively count non-null value and null value?
    // for example
    // val isNull = one.nullAwareCopyFromSafe(i, offset + i, another) 
    // if isNull then j += 1 else i += 1
    i += 1
one.setValueCount(offset + i) 
// offset + i is 6, but another may contain null, so value count could be different from actual value.

Component(s)

Java

lidavidm commented 1 year ago

@davisusanibar do we have a recipe for this?

davisusanibar commented 1 year ago

Hi @i10416, sorry to join late,

There is a VectorAppender class with some util methods, main problem it is private-package.

Class inside custom package:

    package org.apache.arrow.vector.util;

    import org.apache.arrow.memory.BufferAllocator;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.IntVector;
    import org.apache.arrow.vector.ValueVector;

    public class FieldVectorAppender {
      public static void main(String[] args) {
        try (
            BufferAllocator allocator = new RootAllocator();
            IntVector initialValues = new IntVector("initialValues", allocator);
            IntVector toAppend = new IntVector("toAppend", allocator);
        ) {
          initialValues.allocateNew(2);
          initialValues.set(0, 1);
          initialValues.set(1, 2);
          initialValues.setValueCount(2);
          System.out.println("Initial IntVector: " + initialValues);
          toAppend.allocateNew(4);
          toAppend.set(1, 4);
          toAppend.set(3, 6);
          toAppend.setValueCount(4);
          System.out.println("IntVector to Append: " + toAppend);
          VectorAppender appenderUtil = new VectorAppender(initialValues);
          ValueVector resultOfVectorsAppended = toAppend.accept(appenderUtil, null);
          System.out.println("IntVector Result: " + resultOfVectorsAppended);
        }
      }
    }

Test:

    Initial IntVector: [1, 2]
    IntVector to Append: [null, 4, null, 6]
    IntVector Result: [1, 2, null, 4, null, 6]