factorhouse / kpow

Kpow for Apache Kafka
https://factorhouse.io/kpow
Other
37 stars 5 forks source link

Fields with 'false' boolean values missing from Protobuf encoded messages #29

Closed gabistoenescu closed 1 month ago

gabistoenescu commented 1 month ago

Version of Kpow factorhouse/kpow-ce:93.3

Describe the bug When consuming(topic inspection) Protobuf encoded messages with boolean fields the outputted JSON values are missing those fields when their values are false while including them when their values are true.

To Reproduce Steps to reproduce the behavior:

  1. Go to "Topics" and select a topic containing "Protobuf" encoded messages
  2. Chose "Inspect" from the corresponding context menu
  3. In the "Inspect" tab select the following options: a. Mode: Sample b. Window: Recent c. Key deserializer: None d. Value deserializer: Protobuf e. Headers deserializer: None f. Schema Registry: "the-schema-registry-name". We AWS Glue as a schema registry and the corresponding schema definitions are successfully retrieved from there.
  4. Click "Search"

Expected behavior Expected to always see boolean fields being displayed when consuming such messages, irrespective of their values, but fields with false values are omitted from the output. We know with certainty that boolean fields with false values exist on many of our messages and just not displayed by "kpow" for some reason.

Screenshots Message output displaying the occupied boolean field when its value is true

topic: our-topic
partition: 1
offset: 1561769
timestamp: 1722962144562
age: 02s
value:
{
  "id": "06158910-e8ad-4270-afed-0bc2d3d3b673",
  "timestamp": "2024-08-06T16:35:44.562269840Z",
  "sourceEvent": {
    "id": "cbc7a570-4c8f-45b2-9aaa-8c174eb68737",
    "timestamp": "2024-08-06T16:35:44.222634Z",
    "sourceId": "1774cd10-777b-4146-b65f-72e3cd2f1431",
    "updateTime": "2024-08-06T16:35:42.792Z",
    "parkingAreaId": 3,
    "parkingSpaceId": "3E321M",
    "parkingSpaceName": "E321 Space M",
    "occupied": true
  },
  "authRealm": "someRealm",
  "authClientId": "someClient"
}

Message output omitting the occupied boolean field when its value is false

topic: our-topic
partition: 1
offset: 1561770
timestamp: 1722962144562
age: 02s
value:
{
  "id": "1034ccd1-17e9-490f-bb14-bcaa11c9456a",
  "timestamp": "2024-08-06T16:35:44.562404773Z",
  "sourceEvent": {
    "id": "51839ddc-39d5-4801-9969-4c5833af1c27",
    "timestamp": "2024-08-06T16:35:44.222644Z",
    "sourceId": "3601b01e-4af8-44eb-aa30-f978e77bbb9c",
    "updateTime": "2024-08-06T16:35:42.792Z",
    "parkingAreaId": 3,
    "parkingSpaceId": "3E321F",
    "parkingSpaceName": "E321 Space F"
  },
  "authRealm": "someRealm",
  "authClientId": "someClient"
}

Protobuf schema subject retrieved from schema registry

syntax = "proto3";
option java_package = "our.package.message";
import "google/protobuf/timestamp.proto";

message Status {
  string id = 1;
  google.protobuf.Timestamp timestamp = 2;
  SourceEvent sourceEvent = 3;
  string auth_realm = 4;
  string auth_client_id = 5;
}

message SourceEvent {
  string id = 1;
  google.protobuf.Timestamp timestamp = 2;
  string source_id = 3;
  google.protobuf.Timestamp update_time = 4;
  uint32 parking_area_id = 5;
  string parking_space_id = 6;
  string parking_space_name = 7;
  bool occupied = 8;
}

Environment:

wavejumper commented 1 month ago

Hi @gabistoenescu,

Thanks for your detailed bug report. I was able to reproduce your issue locally.

It looks like when we deserialize Protobuf messages we use the standard JsonFormat printer.

As you have discovered, the default JSONFormat printer does not return the field when "occupied": false.

Using a JsonFormat printer returned by the method .includingDefaultValueFields() seems to fix this bug for me. And this is because the default value of a bool field in protobuf is false.

We will get a fix in for the next release. We will either make the JSONFormat printer configurable (and keep the current Printer as the default), or we will simply move towards using the includingDefaultValueFields formatter for Protobuf.

Thanks, Tom

gabistoenescu commented 1 month ago

@wavejumper Thank you Tom

gabistoenescu commented 1 month ago

Additional clarifications - the filed omissions actually happen for all Protobuf data types with default values. So for example, fields with empty strings, int values of zero, empty arrays, etc, will be excluded when inspecting data.

wavejumper commented 1 month ago

Correct, and switching out the Printer to one that prints all default values (for all types) is what we have done to fix this issue.

An update: the code change has been made, expect a release out next week.

wavejumper commented 1 month ago

Hi @gabistoenescu,

Just letting you know that 93.4 is out today with a fix for this bug!

Slightly related: thought it might be worth also informing you that AWS Glue Protobuf + data produce functionality is currently broken because of a transitive dependency issue between Confluent Schema + AWS Glue libraries. Unfortunately there is no resolution for this bug until the AWS Glue serdes library bumps up their dependencies to a compatible version.

If this is a feature you require, we could prioritise a AWS-glue specific build of Kpow that preferences AWS glue's dependencies (so Protobuf producing would be broken in Confluent instead basically).

If you want to carry on this conversation about AWS Glue + Protobuf please feel free to email support@factorhouse.io.

Best, Tom

d-t-w commented 1 month ago

Thanks for this excellent ticket @gabistoenescu 👍

gabistoenescu commented 1 month ago

Hi Tom,

Thank you for the prompt fix. In regards to the message producing functionality we're not impacted by the current Glue incompatibility since we're primarily using kpow for data visualization. I also agree with your decision to favor the Confluent integration, AWS Glue Protobuf is a lower quality library full of bugs.

Thanks, Gabriel


From: Thomas Crowley @.> Sent: Monday, August 19, 2024 11:16 PM To: factorhouse/kpow @.> Cc: gabistoenescu @.>; Mention @.> Subject: Re: [factorhouse/kpow] Fields with 'false' boolean values missing from Protobuf encoded messages (Issue #29)

Hi @gabistoenescuhttps://github.com/gabistoenescu,

Just letting you know that 93.4 is out today with a fix for this bug!

Slightly related: thought it might be worth also informing you that Protobuf + data produce functionality is currently broken because of a transitive dependency issues between Confluent Schema + AWS Glue libraries. Unfortunately there is no resolution for this bug until the AWS Glue serdes library bumps up their dependencies to a compatible version.

If this is a feature you require, we could prioritise a AWS-glue specific build of Kpow that preferences AWS glue's dependencies (so Protobuf producing would be broken in Confluent instead basically).

If you want to carry on this conversation about AWS Glue + Protobuf please feel free to email @.**@.>.

Best, Tom

— Reply to this email directly, view it on GitHubhttps://github.com/factorhouse/kpow/issues/29#issuecomment-2297983187, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AARPCSXV3TEBU37GBUTIHB3ZSLGJRAVCNFSM6AAAAABMCY67S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJXHE4DGMJYG4. You are receiving this because you were mentioned.Message ID: @.***>