Closed eliax1996 closed 1 month ago
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
karapace
utils.py
karapace/protobuf
field_element.py
proto_file_element.py
proto_normalizations.py
211, 222
schema.py
type_tree.py
66, 73
Project Total
This report was generated by python-coverage-comment-action
This is not complete normalization feature but rather more a fix for the fully qualified paths vs simple name references. This fix not manage the various way a type can be expressed with a partial path.
Long explanation below:
If we accept the specifications from buf as correct, we can look at how a type reference is defined here.
The main problem with the previous implementation is that the current parser can't tell if a fully-qualified type reference in dot notation is the same as one identified by name alone aka simple name notation (fully qualified reference).
The fix do not consider all the different ways users can define a relative reference, schemas with different way of expressing a relative reference even if normalized, for now will keep being considered different.
Right now, our logic removes the
.package_name
(+ the Message scope) part before comparing field modifications (in fact it re-writes the schema using the simple name notation).Even though the TypeTree (trie data structure) could help resolve relative references, we don't want to add this feature in the python implementation now because it might cause new bugs due to the non-trivial behaviour of the protoc compiler.
We plan to properly normalize the protobuf schemas later.
We'll use protobuf descriptors (after the compilation and linking step) to gather type references already resolved, and we will threaten all the protobuf using always the fully qualified names. Properly handling all path variations means reimplementing the protoc compiler behavior, we prefer relying on the already processed proto descriptor.
So, for now, we'll only implement a normalization for the fully-qualified references in dot notation and by simple name alone.
NB: This is not changing the semantics of the message since the local scope its always the one with max priority, so if you get rid of the fully-qualified reference protoc will resolve the reference with the one specified in the package scope.
About this change - What it does
References: #xxxxx
Why this way