JuliaIO / ProtoBuf.jl

Julia protobuf implementation
Other
205 stars 55 forks source link

Allow unpacked repeated primitives #224

Open arhik opened 1 year ago

arhik commented 1 year ago

When supporting an existing protocol which uses unpacked representation for encoding and decoding we need unpacked representation support. If it is strictly encoding and decoding at the other end it is not important. But before decoding if hash is used to verify the object received in remote session then we would get into issues. The hash of the encoded representation will be different and will be rejected in our use case. For now I had to make a fork with these changes and continue using it. It would be ideal if changes are in the upstream. I was assuming you guys must have a reason and didn’t bother to raise an issue. I hope you are convinced about the use-case. If I am the protocol designer and implementer I would definitely avoid unpacked representation but to adhere to existing protocol it would make sense to support it. If it is very rare scenario and it complexifies the repo maintainance I understand.

codecov[bot] commented 1 year ago

Codecov Report

Base: 89.04% // Head: 89.23% // Increases project coverage by +0.19% :tada:

Coverage data is based on head (783a0b1) compared to base (d4dbff5). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #224 +/- ## ========================================== + Coverage 89.04% 89.23% +0.19% ========================================== Files 25 25 Lines 2794 2797 +3 ========================================== + Hits 2488 2496 +8 + Misses 306 301 -5 ``` | [Impacted Files](https://codecov.io/gh/JuliaIO/ProtoBuf.jl/pull/224?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO) | Coverage Δ | | |---|---|---| | [src/codegen/encode\_methods.jl](https://codecov.io/gh/JuliaIO/ProtoBuf.jl/pull/224/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2NvZGVnZW4vZW5jb2RlX21ldGhvZHMuamw=) | `97.77% <100.00%> (+0.07%)` | :arrow_up: | | [src/codec/decode.jl](https://codecov.io/gh/JuliaIO/ProtoBuf.jl/pull/224/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2NvZGVjL2RlY29kZS5qbA==) | `86.91% <0.00%> (+0.46%)` | :arrow_up: | | [src/codegen/toplevel\_definitions.jl](https://codecov.io/gh/JuliaIO/ProtoBuf.jl/pull/224/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2NvZGVnZW4vdG9wbGV2ZWxfZGVmaW5pdGlvbnMuamw=) | `97.90% <0.00%> (+1.39%)` | :arrow_up: | | [src/codegen/CodeGenerators.jl](https://codecov.io/gh/JuliaIO/ProtoBuf.jl/pull/224/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2NvZGVnZW4vQ29kZUdlbmVyYXRvcnMuamw=) | `91.11% <0.00%> (+4.44%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Drvi commented 1 year ago

Thank you for explanation!

So if my understanding is correct, you are relying on an implementation detail. ProtoBuf tries to be very flexible in the way you serialize your messages, e.g. it does not require you to process fields in order which is nice bc you can start creating your payload as your data arrives. It also means that that binary payload is not guaranteed to be stable, as well as its hash. This section of docs https://developers.google.com/protocol-buffers/docs/encoding#order seem to try to warn people against depending on the exact byte output of a serialized message. AFAIK, for all intended use cases, the packed representation is the superior one.

That being said I don't want to break your application, but I think we might want to make this strictly opt-in and hide it behind an options flag.

arhik commented 1 year ago

Thank you for heads up. Existing code has everything in place. I just uncommented the section where [packed=true/false] option is ignored. I believe this PR (existing commented code) is already respecting it as a option. works for me with this [packed=false] option set after these changes. Sorry if you are aware of it and I am just repeating what you are trying to mention.

But there are not tests around it. Is it ideal for you to have tests within this PR ? Let me know. I will give it a try.

Drvi commented 1 year ago

Yes, adding tests would be great!

Drvi commented 1 year ago

I think this might mess up _encoded_size so we'll probably need to add similar codegen tweak + tests for that.