Closed loicmathieu closed 3 months ago
This only happens when rows are arrays, everything seems fine with objects:
With objects (OK):
Test file:
{id:1,name:"Strawberry"}
{id:2,name:"Blueberry"}
{id:3,name:"Raspberry"}
{id:4,name:"Blackberry"}
{id:5,name:"Cranberry"}
Iterating ...
{id=1, name=Strawberry}
{id=2, name=Blueberry}
{id=3, name=Raspberry}
{id=4, name=Blackberry}
{id=5, name=Cranberry}
With arrays (one per row) (KO):
Test file:
["Strawberry"]
["Blueberry"]
["Raspberry"]
["Blackberry"]
["Cranberry"]
Iterating ...
[Blueberry]
[Raspberry]
[Blackberry]
[Cranberry]
With arrays (all on a single row) (KO):
Test file:
["Strawberry", "Blueberry", "Raspberry", "Blackberry", "Cranberry"]
Iterating ...
<nothing>
Makes sense, state auto-detection wrt arrays is problematic as the first START_ARRAY can mean one of 2 things:
and so iterator has to choose one or the other. I think default is (1), meaning that for (2) behavior is incorrect.
In case (2), caller needs to create and initialize JsonParser
to point to the START_ARRAY of the first item (skipping the "wrapper" START_ARRAY).
This is not Ion specific, I think, and is something that cannot be reliably resolved due to inherent ambiguity (and since JsonParser
does not have look-ahead capability to check for nested START_ARRAYs, for example)
@cowtowncoder do you mean that the MappingIterator
implementation has an issue (for ION or more broadly) or that I use it the wrong way?
As your comment is about JsonParser
which I don't use directly.
I mean that there is a fundamental ambiguity wrt token stream abstraction and data-binding that MappingIterator
cannot solve, related to Java values for which serialization is Array.
So whereas code you use works perfectly well for something represented as Objects (JSON or Ion), it takes a guess on seeing JsonToken.START_ARRAY
, regarding how to consume that token. Basically seeing it, assumption is that you have an Array of Things (of whatever shape) -- but what you have is a Sequence/Stream of Things (of Array shape). So instead of seeing the first value (with first token JsonToken.START_ARRAY
) it sees wrapper Array.
... at least this is my hypothesis.
If so, to make things work, you will need to construct a JsonParser
-- in your case, IonParser
which is subtype (naming of JsonParser
is due to legacy reasons but it is shared base type for all formats) -- and advance it to that first JsonToken.START_ARRAY
and now construct MappingIterator
.
This way iterator will assume that parser instance is pointing to the first token of the first value to be iterated over, NOT wrapped within an Array.
I hope above helps: I realize it's not the most readable explanation but hopefully helps unlock the problem.
@cowtowncoder thanks for the explaination.
It works with the following changes:
// This file has 5 lines
File testFile = resourceToFile("test.ion");
ObjectMapper objectMapper = JacksonUtils.createIonObjectMapper();
JsonParser parser = objectMapper.createParser(new FileInputStream(testFile));
parser.nextToken();
MappingIterator<Object> mappingIterator = objectMapper.readerFor(DEFAULT_TYPE_REFERENCE).readValues(parser);
// the first line is missing
LongAdder count = new LongAdder();
mappingIterator.forEachRemaining(obj -> {
count.increment();
});
Assertions.assertEquals(5, count.sum());
As I understand it, there is nothing that can be done to make this works OOTB? So should I close this issue?
Correct: as far as I understand things, there is natural ambiguity that makes one of 2 alternative cases not work. Cases are:
[ { "id":1}, {"id":2} ]
)[1,2] [2,3] [4,5]
)and upon seeing the initial [
(or its equivalent in Avro etc) it is not possible to know (without look-ahead; but more fundamentally, without being able to figure out representation of the value type) which case we have.
So the tie-breaker is to assume it is always (1).
But this decision is only needed when JsonParser
is "unitialized", i.e. does not point to a token (so parser.nextToken()
is called).
If there is current token, it is assume it MUST be the first token of the first value.
So I agree this should be closed since it does not appear that Ion backend specifically has issues of its own.
Closing it, the detailed answer and reproducer / workaround example are enough to help someone else that have the same issue understand it and work around it.
Thanks a lot @cowtowncoder for the detailed explanation and your help.
You are wlecome @loicmathieu !
When reading an ION file using a MappingIterator, the first line is omitted.
When reading in a more traditional approach the object mapper correctly read all 5 lines
Reproducer project, there is a test in it:
mapping-iterator.zip