I have a suggestion that I can probably implement myself if you agree to do it. However, I would like to have your opinion on it before working on it.
Currently, messages redefine their getattribute function. This is used for two reasons:
It allows raising an AttributeError when a field that is part of a oneof but is not the active one is accessed
It allows replacing the Placeholder values by the default value when the field is accessed.
However, this mechanism has several important issues, and some of them cannot be fixed without switching to a different method:
Redefining getattribute is very slow. Each time a field of a message is accessed, this function is called. This is of course the case for the proto fields, but getattribute is also called when other attributes such as the ones containing the metadata are accessed. Plus, each call is slow since getattribute performs queries to dictionaries. In a small benchmark that I did, I found out that a very significant part of the time spent when decoding a message was spent in this function. This is also a problem when using the rust codec.
The lack of support of unset messages (as they are automatically replaced by empty messages) makes it impossible to distinguish between an empty message and no message at all. This should be possible since the protobuf documentation states that the presence of message fields should always be tracked, even when the message is not declared as optional (see https://protobuf.dev/programming-guides/field_presence/ ).
More generally, the behaviour of this mechanism is not intuitive at all in my opinion: reading the value of an attribute may change the value of the message (by adding an empty file). I believe that a simple read should not have this behaviour.
These issues are so important that they made someone I know prefer removing all these functionalities and deal with the placeholders by hands… which is far from being ideal.
I would rather do more of less the following:
For primitive fields (int, str, float, …), nothing much changes apart from the default value: it becomes 0, “”, 0.0, … instead of PLACEHOLDER
Messages fields are always declared as optional (which is coherent with the protobuf documentation). When no value is provided, None is used instead. None and an empty message are therefore different values, which is expected. A consequence is that empty messages are not created on the fly when accessing unset fields, but I think it is better to have this behaviour anyway.
The uniqueness of one of fields is not enforced during the construction of the object (since getattribute is no longer used). It becomes the responsibility of the user to construct a valid message. The message can still be validated before being sent, but not during its construction since this can be really expensive. This would be a bit similar to what Pydantic does: the validators are applied in the constructor, but they are not applied on field assignments by default due to the overhead of redefining setattribute.
The problem with these updates is that they would also require a modification of the rust codec (but it should be possible if handled correctly). Another small drawback is that it would break compatibility for the programs that rely on these functionalities. However, I think that as long as the change is properly documented, it would not be a lot of work to adapt the programs. Plus, it is the right time to make such a change as the v2 is not yet released.
Summary
Hello,
I have a suggestion that I can probably implement myself if you agree to do it. However, I would like to have your opinion on it before working on it.
Currently, messages redefine their getattribute function. This is used for two reasons:
However, this mechanism has several important issues, and some of them cannot be fixed without switching to a different method:
What do you think?