Closed marcospassos closed 5 years ago
Quick question here: is this related to how Apache Avro library generates schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?
Just figuring out where to add the test: if it's related to apache lib, interop is fine; but if not I try to keep depenencies to it to minimum.
I will try to help with this issue over the weekend: thank you for all your work on Avro module & apologies for limited time I have to spend to help keep you unblocked.
Quick question here: is this related to how Apache Avro library generates the schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?
I don't believe that is related to schema generation, but rather to the serialization logic that expects a non-empty list of field, mainly because I’m providing the schema in the test.
Thank you for your support. I'm glad to contribute to a library that helps us that much.
@marcospassos Ok so the test can be simplified to use (f.ex) embedded schema it sounds like.
Did not have time to dig into this today: it's at top of my list and (aside from adding comments which I prioritize high to try to help everyone), and should be able to at least get test added, have a look at exception tomorrow,
@marcospasson Ok yes, I can see the problem. Will see if I can easily fix this corner case.
Hmmh. So, end-of-input detection by reader notices we are out of content (which is true), but reader probably would need to synthesize START_OBJECT/END_OBJECT for special case of empty Object(s). Possibly/probably only affects root values, not nested ones.
This may be difficult thing to fix. The problem is that at root level decision needs to be made regarding whether more content is available or not, and that is determined by checking if more content is available. In this case there isn't, and usually that signals end-of-content. But since "empty" Record takes no space, in this instance expectation is that handling should sort of materialize empty structure (START_OBJECT, END_OBJECT) out of no content.
So: the idea would be to detect the case of Record with no fields being read, and if so, avoid indicating end-of-content. With that RecordReader
should be able to properly return START_OBJECT/END_OBJECT sequence as expected.
And just to test above: if I pass new byte[1]
, test passes: reader is able to get empty record decoded.
So the trick is just to figure out how to find this special case in RootReader
, either during its initialization, or, possibly, within nextToken()
if possible.
Hmmm, got it. How about checking the schema for emptiness?
That is sort of what I ended up doing, having a look at fields found, if any. I suspect some edges might still cause problems (maybe for "constant" values), but this case works now.
Well, it does for 2.10; master
has something else remaining that I need to look into (due to optimized field-name lookups).
Thank you!
Np -- thank you for reporting it. Let me know if you find other edge cases.
@cowtowncoder Could you please give me some guidance on how to fix this issue? I'd like to fix it in time for 2.10.
The unit test spots the issue: empty records fail on deserialization.