Aiven-Open / karapace

Karapace - Your Apache Kafka® essentials in one tool
https://karapace.io
Apache License 2.0
450 stars 68 forks source link

fix: get rid of the path for fully qualified names. #912

Closed eliax1996 closed 1 month ago

eliax1996 commented 2 months ago

This is not complete normalization feature but rather more a fix for the fully qualified paths vs simple name references. This fix not manage the various way a type can be expressed with a partial path.

Long explanation below:

If we accept the specifications from buf as correct, we can look at how a type reference is defined here.

The main problem with the previous implementation is that the current parser can't tell if a fully-qualified type reference in dot notation is the same as one identified by name alone aka simple name notation (fully qualified reference).

The fix do not consider all the different ways users can define a relative reference, schemas with different way of expressing a relative reference even if normalized, for now will keep being considered different.

Right now, our logic removes the .package_name (+ the Message scope) part before comparing field modifications (in fact it re-writes the schema using the simple name notation).

Even though the TypeTree (trie data structure) could help resolve relative references, we don't want to add this feature in the python implementation now because it might cause new bugs due to the non-trivial behaviour of the protoc compiler.

We plan to properly normalize the protobuf schemas later.

We'll use protobuf descriptors (after the compilation and linking step) to gather type references already resolved, and we will threaten all the protobuf using always the fully qualified names. Properly handling all path variations means reimplementing the protoc compiler behavior, we prefer relying on the already processed proto descriptor.

So, for now, we'll only implement a normalization for the fully-qualified references in dot notation and by simple name alone.

NB: This is not changing the semantics of the message since the local scope its always the one with max priority, so if you get rid of the fully-qualified reference protoc will resolve the reference with the one specified in the package scope.

About this change - What it does

References: #xxxxx

Why this way

github-actions[bot] commented 2 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  karapace
  utils.py
  karapace/protobuf
  field_element.py
  proto_file_element.py
  proto_normalizations.py 211, 222
  schema.py
  type_tree.py 66, 73
Project Total  

This report was generated by python-coverage-comment-action