FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
314 stars 136 forks source link

Deserialization of "empty" Records as root values fails #177

Closed marcospassos closed 5 years ago

marcospassos commented 5 years ago

@cowtowncoder Could you please give me some guidance on how to fix this issue? I'd like to fix it in time for 2.10.

The unit test spots the issue: empty records fail on deserialization.

cowtowncoder commented 5 years ago

Quick question here: is this related to how Apache Avro library generates schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?

Just figuring out where to add the test: if it's related to apache lib, interop is fine; but if not I try to keep depenencies to it to minimum.

I will try to help with this issue over the weekend: thank you for all your work on Avro module & apologies for limited time I have to spend to help keep you unblocked.

marcospassos commented 5 years ago

Quick question here: is this related to how Apache Avro library generates the schema, or can this be easily replicated with textual schema and/or Avro module's schema generator?

I don't believe that is related to schema generation, but rather to the serialization logic that expects a non-empty list of field, mainly because I’m providing the schema in the test.

Thank you for your support. I'm glad to contribute to a library that helps us that much.

cowtowncoder commented 5 years ago

@marcospassos Ok so the test can be simplified to use (f.ex) embedded schema it sounds like.

cowtowncoder commented 5 years ago

Did not have time to dig into this today: it's at top of my list and (aside from adding comments which I prioritize high to try to help everyone), and should be able to at least get test added, have a look at exception tomorrow,

cowtowncoder commented 5 years ago

@marcospasson Ok yes, I can see the problem. Will see if I can easily fix this corner case.

cowtowncoder commented 5 years ago

Hmmh. So, end-of-input detection by reader notices we are out of content (which is true), but reader probably would need to synthesize START_OBJECT/END_OBJECT for special case of empty Object(s). Possibly/probably only affects root values, not nested ones.

cowtowncoder commented 5 years ago

This may be difficult thing to fix. The problem is that at root level decision needs to be made regarding whether more content is available or not, and that is determined by checking if more content is available. In this case there isn't, and usually that signals end-of-content. But since "empty" Record takes no space, in this instance expectation is that handling should sort of materialize empty structure (START_OBJECT, END_OBJECT) out of no content.

So: the idea would be to detect the case of Record with no fields being read, and if so, avoid indicating end-of-content. With that RecordReader should be able to properly return START_OBJECT/END_OBJECT sequence as expected.

cowtowncoder commented 5 years ago

And just to test above: if I pass new byte[1], test passes: reader is able to get empty record decoded. So the trick is just to figure out how to find this special case in RootReader, either during its initialization, or, possibly, within nextToken() if possible.

marcospassos commented 5 years ago

Hmmm, got it. How about checking the schema for emptiness?

cowtowncoder commented 5 years ago

That is sort of what I ended up doing, having a look at fields found, if any. I suspect some edges might still cause problems (maybe for "constant" values), but this case works now.

Well, it does for 2.10; master has something else remaining that I need to look into (due to optimized field-name lookups).

marcospassos commented 5 years ago

Thank you!

cowtowncoder commented 5 years ago

Np -- thank you for reporting it. Let me know if you find other edge cases.