dapr / dapr

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
https://dapr.io
Apache License 2.0
24.16k stars 1.91k forks source link

Propagate arbitrary/configurable http headers from publisher to subscribers #6075

Open tilm4nn opened 3 years ago

tilm4nn commented 3 years ago

In what area(s)?

/area runtime /area docs

HTTP request header propagation

It would be great if dapr sidecars used for pub/sub could be configured to forward certain headers from the publisher request to the subscriber request.

Example Use Case: For collection of distributed traces in a microservice environment we use Datadog and for instrumentation of our Java-based micro services the Datadog Java agent. This Java agent does not support the W3C traceparent header that is currently forwarded by dapr. Instead it propagates trace context using the headers:

So, to achieve correct trace context propagation we would like to configure dapr in such a way that the header values the Java agent sets on publishing requests are also present on the subscription's incoming http requests where they can be picked up by the Java agent of the receiving service.

yaron2 commented 3 years ago

Dapr supports arbitrary fields in a custom CloudEvents envelope, so theoretically the publisher can just add all those fields to the CloudEvent message and these would be visible to subscribers.

Does that meet your needs? https://docs.dapr.io/developing-applications/building-blocks/pubsub/howto-publish-subscribe/#sending-a-custom-cloudevent

tilm4nn commented 3 years ago

When I add custom fields to the envelope do they arrive as HTTP request headers in the HTTP call from dapr sidecar to the subscriber? I am aware that I can manually read data from the envelope in the Java code of the subscriber but the Datadog Java agent instrumentation expects the tracing information in HTTP request headers. (Just the same way the OpenTelemetry Java agent does too, only other headers than "traceparent")

I was thinking of something already implemented like

  1. dapr sidecars propagate the "traceparent" HTTP request header from publisher to the subscriber
  2. dapr sidecars propagate arbitrary HTTP headers from caller to service when using dapr service invocation
saber-wang commented 3 years ago

@yaron2 I need the same function

t2y commented 2 years ago

@yaron2 I also need this.

rahulpoddar-fyndna commented 2 years ago

We are also looking for this feature, where we can pass custom headers while invoking

  1. invokeBinding when kafka is configured (kafka.binding). This works fine when using with http binding. But does not work as expected while using with kafka binding.
  2. publishEvent when using with kafka.pubsub Above using java SDK.
berylshow commented 2 years ago

@yaron2 please Is there any plan to support x-b3 tracking protocol? thanks

artyom-p commented 2 years ago

Any updates on that?

olitomlinson commented 2 years ago

+1 on this

We have a fundamental architectural use-case which relies on this capability becoming available soon, ideally in 1.9. (Nothing to do with distributed tracing btw)

berndverst commented 2 years ago

FYI @mukundansundar

And somewhat related: There is also the request for arbitrary fields / properties on the message data itself. This is referred to as cloud event extensions generally.

I have a draft PR for the cloud event extensions, but it seems we additionally would need a field on the cloud event envelope proto to store arbitrary headers in some sort of map.

mukundansundar commented 2 years ago

FYI @mukundansundar

And somewhat related: There is also the request for arbitrary fields / properties on the message data itself. This is referred to as cloud event extensions generally.

I have a draft PR for the cloud event extensions, but it seems we additionally would need a field on the cloud event envelope proto to store arbitrary headers in some sort of map. @berndverst For a cloudEvent we can have an additional field called metadata, but for raw payload we will not be able to store any metadata with the payload itself.

The main need is to be able to set some application defined properties along with the message.

Most of the brokers and their SDKs do support storing storing application defined properties/metadata along with the message. For example:

EventHub has a Properties map[string]interface{} field for every Event. ServiceBus message has an ApplicationProperties map[string]interface{} field to store custom metadata for a message.

GCP Pubsub has an Attributes map[string]string field to store custom key-value pairs the message should be labelled with.

Hazelcast has no support for metadata storing along with message.

NATS Jetstream has support for Header map[string]string field through the Msg struct which can be used to publish message to Jetstream.

Kafka has the Headers []RecordHeader field to store metadata along with message.

MQTT has no support for metadata storing along with message.

Pulsar has a Properties map[string]string for application defined properties on the message.

RabbitMQ has a Headers Table field (Table is map[string]interface{}) for Application/exchange specific fields.

Redis has no support directly for storing the metadata.

RocketMQ has a properties map[string]string field storing metadata.

In this scenario except for Redis, Hazelcast and MQTT, all brokers do seem to support metadata to be stored along with the event.

Should we target storing these headers as metadata in these brokers for each message? @dapr/maintainers-dapr thoughts?

olitomlinson commented 2 years ago

In addition to @mukundansundar list :

XavierGeerinck commented 2 years ago

Can +1 saying that it's preventing me from working with e.g. Azure Digital Twins since it sends to Event Hub and they utilize source as the twinId thus preventing twinId to be accessible as the property is not being mapped correctly.

Why this needs to be handled by Dapr?

Some services implement CloudEvents "opiniated" I would say, seeing that they parse it and the provide it through properties. Which results in certain data not being passed correctly (e.g., subject)

To provide an example of how this looks like:

What I am seeing

{
  body: {
    patch: [Array]
  },
  properties: {
    'cloudEvents:id': '09b86e94-8cf8-aaaa-8474-780822220dc2',
    'cloudEvents:source': 'aaaaaa.api.weu.digitaltwins.azure.net',
    'cloudEvents:specversion': '1.0',
    'cloudEvents:type': 'Microsoft.DigitalTwins.Twin.Update',
    'cloudEvents:time': '2022-09-19T16:58:20.5523973Z',
    'cloudEvents:subject': 'my-subject',
    'cloudEvents:traceparent': '00-001eadf367724f4d1dfbcfe8f2119aea-aaaaaaaa-01',
    CorrelationId: 'd79d429f-60df-aaaa-aaaa-c14a3eb3831d',
    ContentType: 'application/json'
  },
}\

Which normally should be:

{
  body: {
    id: '09b86e94-8cf8-aaaa-8474-780822220dc2',
    source: 'aaaaaa.api.weu.digitaltwins.azure.net',
    specversion: '1.0',
    type: 'Microsoft.DigitalTwins.Twin.Update',
    time: '2022-09-19T16:58:20.5523973Z',
    subject: 'my-subject',
    traceparent: '00-001eadf367724f4d1dfbcfe8f2119aea-aaaaaaaa-01',
    correlationId: 'd79d429f-60df-aaaa-aaaa-c14a3eb3831d',
    contenttype: 'application/json',
    data: {
      patch: [Array]
    }
  },
}
saber-wang commented 2 years ago

Any progress?

MattCosturos commented 1 year ago

Is it possible to get Application Properties (enriched messages) from an Event hub via a Dapr binding now? Currently using a bindings.azure.eventhubs and not a pub-sub, not sure if I need to switch to a pub-sub to get the additional properties.

berndverst commented 1 year ago

Is it possible to get Application Properties (enriched messages) from an Event hub via a Dapr binding now? Currently using a bindings.azure.eventhubs and not a pub-sub, not sure if I need to switch to a pub-sub to get the additional properties.

This issue here is unrelated to your question. Please find or open a related issue in the dapr/components-contrib repo @MattCosturos.

From what we know (and from looking at the code just now) nobody has implemented this in the binding. Please try using the PubSub component (you need to specify "requireAllProperties":"true" in the subscription metadata). Open an issue in the component-contrib repo to get this added to the binding. In general use the PubSub components instead of binding components whenever possible.

MattCosturos commented 1 year ago

Sorry, this issue was linked from the PR where I previously asked, (and told to comment on the issue).

In general use the PubSub components instead of binding components whenever possible

Is there a reason why, or is this documented anywhere? Looking at the bindings overview, and specific page, there is nothing that would indicate one should use one over the other.

https://docs.dapr.io/reference/components-reference/supported-bindings/eventhubs/ https://docs.dapr.io/reference/components-reference/supported-pubsub/setup-azure-eventhubs/

berndverst commented 1 year ago

Sorry, this issue was linked from the PR where I previously asked, (and told to comment on the issue).

In general use the PubSub components instead of binding components whenever possible

Is there a reason why, or is this documented anywhere? Looking at the bindings overview, and specific page, there is nothing that would indicate one should use one over the other.

https://docs.dapr.io/reference/components-reference/supported-bindings/eventhubs/

https://docs.dapr.io/reference/components-reference/supported-pubsub/setup-azure-eventhubs/

That's fundamental to Dapr. Bindings are not really a building block - they are the one-off specific purpose components that only exist for things which don't fit into the established building blog paradigms. Bindings are not interchangeable with each other whereas for most use cases (though certainly not the Azure IoT case) PubSub components are interchangeable.

Bindings may have features the other building blocks don't have but they won't have all the features of the building blocks.

Always start with the building block (if there is one) and use the binding only if the building block doesn't work for you, or you are certain from documentation you must use the binding.

berndverst commented 1 year ago

@MattCosturos FYI another user confirms that the PubSub component works for getting the properties from IoTHub. Please do open an issue components/contrib if you need this functionality specifically in the EventHub binding.

rkenshin commented 1 year ago

Was having the same issue, tried instead to use PubSub to subscribe to messages coming from an IotHub. It worked, I got all application properties, however system properties were missing a key header: iothub-message-source. This header tells if the message is a Telemetry, DeviceConnectionStateEvents or twinChangeEvents. Any idea, why this header specifically is not propagated, or am I missing something ? I also noticed that this header is not listed in the table describing IOTHub System Properties that will be part of the response https://docs.dapr.io/reference/components-reference/supported-pubsub/setup-azure-eventhubs/#subscribing-to-azure-iot-hub-events

mecoding1 commented 1 year ago

@rkenshin

however system properties were missing a key header: iothub-message-source

iothub-message-source is not a system property for the D2C messages as mentioned in the System Properties of D2C IoT Hub messages .

As mentioned in the Dapr documentation, currently it is for device-to-cloud events created by Azure IoT Hub devices,

The device-to-cloud events created by Azure IoT Hub devices will contain additional IoT Hub System Properties,

While iothub-message-source is a system property only when its emitted directly by IoT Hub in response to specific kinds of state changes associated with your devices. . The difference is referenced here at iot-hub-non-telemetry-event-schema

rkenshin commented 1 year ago

@mecoding1 I see, however D2C messages still get routed through IotHub, augmented with the iothub-message-source = Telemetry header before reaching either the default endpoint or a custom one. This is similar to all other messages related to TwinChanges, DeviceLifeCycle or DeviceConnectionState. I believe that this header is essential when all these messages are routed to the same endpoint, usually the default events. Right now, we process all these messages using the same azure function. We dedicate a strategy per message type based on this header. When we tried to use a Dapr PubSub instead, we had to route messages to different endpoints to overcome this limitation. Is there a better way other than introducing support for this header in my opinion :) ? image

zhenlei520 commented 1 year ago

@berndverst Any news on this issue? I hope that Sidecar can pass custom headers from service A to service B, which needs to be available under pubsub and service calls. I have read the documentation and learned that this information will be stored in the envelope, but it is missing in dotnet-sdk

hugosmitter commented 1 year ago

For Pulsar, the 'properties' map is very valuable for adding metadata outside the payload. For instance, Pulsar functions handling encrypted messages (end-to-end encryption) can rely source/destination properties added by the producer to route messages without accessing the encrypted payload. In addition to the 'properties' map, it's crucial to also have the ability to set the 'key' during runtime (not declared in the dapr pusub component). The producer application sets the 'key' on each published message when using the Pulsar key-shared subscription type. It doesn't make sense to have the 'key' hardcoded in the component. See Pulsar messaging.

berndverst commented 1 year ago

Everyone - for each PubSub component please open a separate issue in the https://github.com/dapr/components-contrib repo and outline which properties you need to propagate or have set, and please link to the relevant documentation of the technology / service so implementation is easier. Those are all distinct issues and distinct implementations.

This issue you are all commenting on is about passing through of arbitrary headers / metadata in the PubSub data payload. This is quite a bit different from many of the comments we are seeing here.

If you publish a cloud event with Dapr using content-type application/cloudevents+json any cloud event extensions in the payload (sometimes referred to as custom headers) should be passed through to the client via both HTTP and gRPC.

See: https://github.com/cloudevents/spec/blob/main/cloudevents/formats/json-format.md

Also, please note that I am not personally not working on anything else related to this issue, so I cannot provide any updates.

KrylixZA commented 1 year ago

Hey all. Just bumping this thread as we ran into this issue as well using the .NET SDK through Kafka.

peter-alegeus commented 1 year ago

+1 We have some values to propagate via pubsub, currently Event Hubs but perhaps in the future it might be Confluent Kafka. It would be more flexible if we could publish these values as headers instead of placing them within an event payload, using the .Net SDK.

jcnewmanunilink commented 7 months ago
jcnewmanunilink commented 7 months ago

I see this is a P1 issue but it has been open since 2021, does this mean there is no likelihood that this will be fixed? This is a huge blocker on our project as we need to consume a message from a 3rd party where there is some info on headers that we need to consume? :-)

stannynuytkens commented 7 months ago

I need this too, passing X-.. headers from pub to sub with RabbitMq as the broker.

yaron2 commented 7 months ago

I see this is a P1 issue but it has been open since 2021, does this mean there is no likelihood that this will be fixed? This is a huge blocker on our project as we need to consume a message from a 3rd party where there is some info on headers that we need to consume? :-)

Dapr will deliver component specific headers from pub/sub as HTTP headers or gRPC metadata, but this is done on a per-component basis. What component do you need intend to use?

601093318 commented 5 months ago

This feature is very practical, why hasn't anyone solved it? Can any members explain?

sclarke81 commented 5 months ago

I see this is a P1 issue but it has been open since 2021, does this mean there is no likelihood that this will be fixed? This is a huge blocker on our project as we need to consume a message from a 3rd party where there is some info on headers that we need to consume? :-)

Dapr will deliver component specific headers from pub/sub as HTTP headers or gRPC metadata, but this is done on a per-component basis. What component do you need intend to use?

@yaron2 @berndverst I think the discussion here is about wanting this as a generic feature that works for all pubsub components. The ideal would be for a transparent experience like we have with service invocation. From what I can gather a solution to this might be to update the components to include a headers collection in the envelope. Then the SDKs could be updated to automatically pack the HTTP headers/gRPC metadata into the collection in the cloudevent (and vice versa).

There may be other questions around the headers that are currently preserved e.g. traceparent. This generic system of retaining all headers could replace the current process of retaining a few named headers or perhaps it should work alongside it.

I feel like there are various issues around discussing generic and specific variations of this, but if a design could be agreed on it could be moved forward. As a user I do feel like retaining the headers is the expected behaviour; I was surprised it didn't work.

artyom-p commented 4 months ago

@yaron2 we are using RabbitMQ with DAPR pubsub and not having headers for passing metadata restricts us a lot, is there a ticket we can upvote or how should we proceed with it to get it working in the nearest future?