Open vibhatha opened 1 week ago
It seems that the validity buffer of the key is not properly written. It is all null.
@vibhatha Is this unrelated to this comment?
UPDATE:
I don't think it's related to them. I will create an issue and resolve it.
@llama90 this is a very older issue which I am trying to solve.
@lidavidm a question:
Field keyField = new Field("id", FieldType.notNullable(new ArrowType.Int(64, true)),
Collections.emptyList());
Field valueField = new Field("value", FieldType.nullable(new ArrowType.Int(64, true)), Collections.emptyList());
Field structField =
new Field("entry", FieldType.notNullable(ArrowType.Struct.INSTANCE), List.of(keyField, valueField));
Field mapIntToIntField = new Field("mapFieldIntToInt", FieldType.notNullable(new ArrowType.Map(false)), List.of(structField));
After debugging this is what I think is happening. We have given the key field a name id
and the value field a name value
. When we try to write to vectors, there are already 2 vectors for the StructVector (within MapVector) two children i.e.
mapVector.getChildrenFromFields().get(0).getChildrenFromFields().get(0).getField() -> key: Int(64, true) not null
and mapVector.getChildrenFromFields().get(0).getChildrenFromFields().get(1).getField() -> value: Int(64, true)
.
But when we go for writing data
@Override
public BigIntWriter bigInt() {
switch (mode) {
case KEY:
return entryWriter.bigInt(MapVector.KEY_NAME);
case VALUE:
return entryWriter.bigInt(MapVector.VALUE_NAME);
default:
return this;
}
}
This is the regular check we have, and these KEY_NAME
and VALUE_NAME
are hardcoded as key
and value
respectively. They are not being updated by looking into the given struct. Thus at writing time, it introduces an additional vector with id
name, and that make is not consume the key
. At least this is what is happening in highlevel. If I rename id
to key
the code works.
In the reading part, it has an incorrect schema. Worse case is, we can get the schema from the vector itself, let's say. Then again we have 2 idle vectors in case users use different names. Shouldn't we update the KEY_NAME
and VALUE_NAME
properly? Or Am I misreading this?
We should get it from the vector, yes. They are recommended to be "key" and "value" but it is not meant to be required
So fix it? Or enforce the key,value usage?
Fix it, the spec says explicitly not to enforce key/value
Describe the bug, including details regarding any error messages, version, and platform.
Referring to the stackoverflow filed issue: https://stackoverflow.com/questions/77878272/apache-arrow-not-all-nodes-and-buffers-were-consumed-error-when-writing-a-map
The following code would yield an error;
Error
Component(s)
Java