databrickslabs / dlt-meta

This is metadata driven DLT based framework for bronze/silver pipelines
Other
125 stars 54 forks source link

java.lang.RuntimeException: non-nullable field authBytes was serialized as null #13

Closed msdotnetclr closed 8 months ago

msdotnetclr commented 9 months ago

With Event Hubs source configuration like the following:

  "source_format": "eventhub",
  "source_details": {
     "source_schema_path": "{dbfs_path}/integration-tests/resources/eventhub_iot_schema.ddl",
     "eventhub.accessKeyName": "{eventhub_accesskey_name}",
     "eventhub.name": "{eventhub_name}",
     "eventhub.secretsScopeName": "{eventhub_secrets_scope_name}",
     "kafka.sasl.mechanism": "PLAIN",
     "kafka.security.protocol": "SASL_SSL",
     "eventhub.namespace": "{eventhub_nmspace}",
     "eventhub.port": "{eventhub_port}"
  },

The pipeline may fail with the following error:

Connection to node -1 ({eventhub_nmspace}.servicebus.windows.net/xx.xx.xx.xx:9093) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue. ... Unexpected error from {eventhub_nmspace}.servicebus.windows.net/xx.xx.xx.xx (channelId=-1); closing connection java.lang.RuntimeException: non-nullable field authBytes was serialized as null

msdotnetclr commented 9 months ago

I believe the problem is in src/pipeline_readers.py, function PipelineReaders.get_eventhub_kafka_options().

Essentially, when eh_shared_key_value is constructed, The "eh_shared_key_name" in f-string "SharedAccessKeyName={eh_shared_key_name};SharedAccessKey={eh_shared_key_value}" should be the name of the access key (also known as "Shared Access Policy").

Currently, the value retrieved from "eventhub.accessKeyName" is used to determine the name of the secret holding the actual Shared Access Policy value, the code combines it with "eventhub.secretsScopeName" to get the actual access key from the secret scope.

This only works when the name of the secret is exactly the same as the name of the Shared Access Policy, but if the two names are different, you either get an authentication failure because SharedAccessKeyName in the connection string refers to a secret name instead of the shared access policy name, or a "secret does not exist" error even before the connection attempt.

I propose to introduce a new source_details option "eventhub.accessKeySecretName" to store the name of the secret, keep "eventhub.accessKeyName" to store the actual Shared Access Policy name, and use both to construct eh_shared_key_value correctly.

ravi-databricks commented 8 months ago

Merged fix in v.0.04 release