Closed JamieMair closed 1 year ago
Patch coverage: 98.79%
and project coverage change: +0.04%
:tada:
Comparison is base (
548017c
) 92.39% compared to head (d780f7e
) 92.43%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thank you @JamieMair, well spotted. This seems to indeed be a bug in ProtoBuf.jl, the current approach to encoding/decoding maps
might be slightly more performant but, as you point out, the spec seems to be pretty adamant that maps
should be as similar to repeated messages of pairs as possible (even though they are not practically compatible, repeated messages can have duplicated "keys" while maps
presumably do not.)
As for this being a breaking change -- I think this qualifies as a bug and should be released as a patch version. I'll check downstream users on JuliaHub and open issues if necessary, giving them some time to adjust.
Can I trouble you for writing a tests/tests for repeated messages and map
s being interchangeable?
the current approach to encoding/decoding
maps
might be slightly more performant
I agree, this additional overhead seems a bit redundant, but at least one can use arrays if that is an issue.
Can I trouble you for writing a tests/tests for repeated messages and
map
s being interchangeable?
I can have a go when I next get some time to work on this. Just to check, would this involve something like
PairStruct
similar to TestStruct
with a key and a valueIs there something simpler I can do, or have I missed anything?
I can have a go when I next get some time to work on this. Just to check, would this involve something like Create a PairStruct similar to TestStruct with a key and a value Add the encoding/decoding methods Encode an array of these structs, and then check whether this is equivalent to encoding a Dict Also check that the array of these structs can be decoded into a Dict?
Basically, yeah. I think you should be able to write a proto file with a PairStruct
to get the methods generated for you, but the test would simply be a matter of encoding a Vector of PairStruct and successfully decoding it as a Dict and vice versa.
I'll try to review the the rest of the PR soon. Again, thanks for your help!
Great! I'll get on that when I next have a little time. Thanks for the quick response!
@Drvi I have just found some time to add in the unit tests you asked for. Feel free to make any changes you want. Thanks for the support on this PR.
@JamieMair Thanks a for the test! It made me realize that we were adding a length for the map
fields i.e. we weren't treating them like repeated fields of messages which don't use the packed representation either. Can you check my changes and try them in your application?
The JET failures on nightly are unrelated to this PR.
@JamieMair Can you check my changes and try them in your application?
I have just tried to test it now in our application. I have a test which checks the bytes generated (see https://github.com/JamieMair/TensorBoardLogger.jl/blob/update-to-new-protobuf/test/test_hparams.jl) but it seems that the encoded length byte on the object is wrong by 2. The second byte of the array should be 91 but it is encoding it as 93. I have tried to investigate why this is an issue, but I can't find it. I think possibly there is a problem with _encoded_size
being a bit too large for the Dict type. Maybe it is this line - https://github.com/JamieMair/ProtoBuf.jl/blob/a14c2e643f31bf4f0a5cf2475d37cbbdcf8c4745/src/codec/encoded_size.jl#L50 - but I don't know for sure.
Ah, good catch @JamieMair I think I have a fix for that.
@JamieMair Can you try again?
@Drvi Yes, that's amazing. All works great on my end - I've removed our workaround. Thanks for finishing this one off!
Fixes issues described in https://github.com/JuliaIO/ProtoBuf.jl/issues/233
Note that these would be breaking changes for any messages serialised which include dictionaries. Previously stored messages will fail when being deserialised. I am not 100% certain that these changes should be merged, but it is worth considering if the existing code did not match the existing specification. However, changing this may affect downstream users that are storing data which includes
Dict
types.