Open willsoto opened 2 years ago
I think the problem may be Avro oddity where in data encoding as File requires use of header which is otherwise not used (or allowed) at all. It would be good to support "File" variant and there may already be an issue filed for it. But no work. It's bit tricky wrt API since Jackson does not have concept separating by input/output source type (the idea of different encoding for File seems specifically peculiar and ... well, bad idea, IMO).
Ah okay...given everything I found I thought this was well supported - especially because of this particular bit AvroGenerator.Feature.AVRO_FILE_OUTPUT
.
That particular feature is documented in JavaDoc and I found this as well: https://github.com/FasterXML/jackson-dataformats-binary/blob/169d2fbd4ec9f9f3d0aa155823e7c51de29237f6/avro/src/main/java/com/fasterxml/jackson/dataformat/avro/ser/RootContext.java#L107-L118
@willsoto Hmmh. I had actually forgotten about this being implemented. But had I read your example in detail, it would have been there.
I assume you have also tried disabling that to see what difference it makes? Is there matching reader (deserialization side) setting to go with it? Apologies for asking questions I should know answer for but I figured you have been investigating this and have good context.
No worries! Appreciate you taking the time to help me out 😄
I assume you have also tried disabling that to see what difference it makes?
If I understand the question, I initially just tried the examples pretty much copy+pasted from the documentation so I didn't even know there was this AvroGenerator.Feature.AVRO_FILE_OUTPUT
setting. It took quite a bit of searching to stumble upon it. In terms of example code, if you just remove the AvroFactory
stuff, that is what I was trying initially.
Is there matching reader (de-serialization side) setting to go with it?
Not sure honestly. The way I've been testing is writing the file and then attempting to open it with avro-tools
to prove it's valid and de-serializable.
Ok that makes sense.
Adding example files into a (new) unit test would be nice too. One challenge wrt Avro tho is that without file header it has zero metadata to detect valid data. This is unlike almost every other format, even protobuf has type tags etc for some level of self-descriptiveness.
I'll try and add a test case this weekend.
Does the code I provided at least seem like it should work? I am curious if we can minimize the reproduction even further.
Oh. The part that possibly (likely?) will not work is the use of writeValues()
(and SequenceWriter
it creates) -- I suspect you cannot simply append root-level values in Avro, unlike in some other formats. So you may need to instead create a container (List
) with matching root-level Avro type to describe the full type. But then again... Avro is designed for data streams so I am not 100% sure (it has been a while since I worked actively on this format module).
While documentation on writing Avro to a file is sparse, I have managed to piece some stuff together but I am still getting an error.
Here is some sample code:
When checking the resultant file using
avro-tools
, I get the following error:According to some searching, the
Invalid sync!
error occurs when the file hasn't been stitched together properly, but it's unclear to me what I need to do in code to get that to happen. I've looked through most of the Avro tests in this repo and I cannot find one that actually writes to a file and then de-serializes from that file.I am not sure if I have stumbled into an actual bug here or not, but I am happy to try and write a test case if this code does seem correct since that would imply it's a bug?
Thanks in advance.
Edit:
I've also tried the following:
In which case I get the following error at that line: