danielgtaylor / python-betterproto

Clean, modern, Python 3.6+ code generator & library for Protobuf 3 and async gRPC
MIT License
1.5k stars 210 forks source link

to_json will miss field. #160

Open wildfire810 opened 3 years ago

wildfire810 commented 3 years ago

image json = program.to_json() '{"sdkVersion": "Python 1.0.0", "head": {"usingQRegList": [0, 1, 2], "usingCRegList": [0, 1]}}'

If I use to_dict,I need to use include_default_values. If I don't use include_default_values, it miss field too.

wildfire810 commented 3 years ago

Those fields also have content.

Gobot1234 commented 3 years ago

Can you show us what Program is and what version you are on?

upcFrost commented 3 years ago

can confirm, to_dict seem to ignore _serialized_on_wire

wildfire810 commented 3 years ago

betterproto[compiler]==1.2.5

// Circuit structure syntax = "proto3"; import "QOperation.proto";

message Program { string sdkVersion = 1; // SDK Version ‘Python 1.0.0’ Head head = 2; Body body = 3; }

message Head { repeated uint32 usingQRegList = 1; // Used quantum register list repeated uint32 usingCRegList = 2; // Used classic register list }

message Body { repeated CircuitLine circuit = 1; map<string, Procedure> procedureMap = 2; }

message CircuitLine { oneof op { FixedGate fixedGate = 1; RotationGate rotationGate = 2; CustomizedGate customizedGate = 3; CompositeGate compositeGate = 4; string procedureName = 5; Measure measure = 6; bool barrier = 7; } repeated uint32 qRegList = 8; // Operated quantum register number repeated double argumentValueList = 9; // Procedure parameter value list repeated int32 argumentIdList = 10; // Procedure parameter index list }

message Procedure { uint32 parameterCount = 1; repeated uint32 usingQRegList = 2; // Used quantum register list repeated CircuitLine circuit = 3; // Inside circuit }

nat-n commented 3 years ago

Thanks for reporting this, the next step would be to create a failing test case.

upcFrost commented 3 years ago

Probably something like this should do:

diff --git a/tests/test_features.py b/tests/test_features.py
index f548264..e040f1b 100644
--- a/tests/test_features.py
+++ b/tests/test_features.py
@@ -197,6 +197,14 @@ def test_optional_flag():
     assert Request().parse(b"").flag is None
     assert Request().parse(b"\n\x00").flag is False

+def test_to_dict_serialized_on_wire():
+    @dataclass
+    class TestMessage(betterproto.Message):
+        some_int: int = betterproto.int32_field(1)
+
+    assert TestMessage().to_json() == '{}'
+    assert TestMessage(some_int=0).to_json() == '{"some_int": 0}'
+

 def test_to_dict_default_values():
     @dataclass

It will fail on the second assert because (from to_dict cases):

upcFrost commented 3 years ago

I think _serialized_on_wire parameter should be moved from Message to a subclass of dataclass.Field, which should be created for every field, not just messages. In fact, even the doc for _serialized_on_wire says it's about fields.

If this kind of solution is ok (and I'm not missing some logic behind keeping this flag inside Message), I think I'll try to make a PR for this one.

Gobot1234 commented 3 years ago

@upcFrost You can just check if the field's raw value is PLACEHOLDER.

upcFrost commented 3 years ago

You can just check if the field's raw value is PLACEHOLDER

hm, should actually work, true. But then why do we even need the _serialized_on_wire flag, as every field is originally covered by the placeholder?

Gobot1234 commented 3 years ago

I guess it could be removed in favour of checking fields, but I'm not sure how that would impact performance.

upcFrost commented 3 years ago

Turned out it's not that simple with placeholders. Typical test case for the serialized_on_wire method is:

foo = Foo()
assert betterproto.serialized_on_wire(foo.bar) is False

Trying to compare foo.bar with a placeholder will fail as __getattribute__, called at foo.bar, lazily inits the bar value. So, imo it would be more straightforward (and maintainable) to subclass the dataclasses.Field, or to add additional meta to it (as _betterproto meta is frozen, and probably not without a reason). And tbh this lazy init makes things a bit weird and complicated, as trying to inspect the message in IDE automatically inits the field.

Small addition: according to the spec this behaviour is actually correct for proto3, except that the include_default_values should be turned on by default. It's mentioned on https://developers.google.com/protocol-buffers/docs/proto3#default and in the info box on https://developers.google.com/protocol-buffers/docs/reference/python-generated#singular-fields-proto3

Gobot1234 commented 3 years ago

This library does still support proto 2 is this also true for that version?

upcFrost commented 3 years ago

This library does still support proto 2 is this also true for that version?

No, for proto2 there should be a HasField method which does exactly what serialized_on_wire does, and for all fields, not only messages

Gobot1234 commented 3 years ago

Right it might be time for a custom field class PR then :)

upcFrost commented 3 years ago

I think field metadata should be easier to maintain and alter if needed than the custom field class. Like the one that's already there. Though I'm not sure if field meta should be frozen (as the one Betterproto already has) or not, should check the doc for this. Also it should be probably a bit faster in terms of performance

abc19899 commented 2 years ago

I meet with this problem too. missing field is very dangerous. maybe we can remove the include_default_values param for the to_dict and force set include_default_values = true? (only for pb3)