aws / aws-xray-daemon

The AWS X-Ray daemon listens for traffic on UDP port 2000, gathers raw segment data, and relays it to the AWS X-Ray API.
Apache License 2.0
189 stars 69 forks source link

Multiple integration support #4

Closed yogiraj07 closed 2 years ago

yogiraj07 commented 6 years ago

Goal

Currently, the X-Ray daemon sends data to the AWS X-Ray service. This issue discusses the changes to be implemented to the existing design that supports multiple backends apart from the X-Ray service

Current design

The X-Ray daemon receives segments on the X-Ray daemon address. Each received segment has a daemon header. The current design utilizes a global memory pool known as buffer pool, (preallocated on initialization, default 1% of total memory) for receiving the UDP payload. A Ring buffer (RB) is a structure implemented using a channel and stores received segments using a goroutine. The size of the RB is 250 segments and each segment in the RB maintains a pointer to a piece of buffer allocated in the buffer pool. By default the buffer size is 64KB and we do not split large payload into multiple buffers. A Processor is on the receiver end of this RB channel and batches segments using a goroutine. A batch is ready to be sent by the processor to a Batch Processor, if it is large enough (default: 50 segments) or the processor goroutine has hit an idle timeout (default: 1 second), upon which the raw payload for the batch is serialized to strings and the buffer is returned to the buffer pool for reuse. The batch processor uses X-Ray client and sends batches to the X-Ray service using the PutTraceSegments API.

Modularization

We intend to decouple components of the X-Ray daemon, so the segments batched by the X-Ray daemon can be routed to the desired backend service. The changes to the design are backward compatible and support the X-Ray service by default.

Client

We create a X-Ray client instance to use the PutTraceSegments API that sends data to the X-Ray service. The X-Ray client implements XRay interface which contains X-Ray service API methods. We will have another interface Service (name yet to be finalized) which contains PutSegments() method. A Client structure will implement the Service interface for the desired backend service. The Client will be a bridge between the X-Ray daemon and the backend service.

Registering Client

In the current design, during initialization of the Processor instance, the X-Ray client is created and set to the Batch Processor instance. When the batch of segments is ready to be sent, the Batch Processor instance uses the X-Ray client to send data to the X-Ray service. This part needs to be restructured and the Client/ X-Ray client will be created as a part of daemon initialization and passed to the Processor instance. Once the Batch Processor instance is configured with the Client/ X-Ray client, existing architecture will send the batch of segments to the configured backend service.

Note : These are initial thoughts on modularizing the X-Ray daemon. Your suggestions are welcome.

jcchavezs commented 6 years ago

One suggestion:

One question: - Will the agent do any sort of transformation with received data or will deliver data to the Service interface in the same way it receives it? The current code does nothing with data more than sending it directly to the backend, I think this is something that could be mentioned here.

Otherwise, great initiative @yogiraj07.

yogiraj07 commented 6 years ago

Hi @jcchavezs , Thank you for the feedback.

I think, we should deliver data to Service interface (name not yet finalized), in the same way we receive it. Let the implementer of the interface handle the desired transformations.

However, this decision is also based on what kind of transformations, user expects and at what point of time in the pipeline. For example, is it during batching segments, or once the batch is ready to be sent, we do transformation on the batch.

Please let us know your motivation for the transformation on received data.

Best, Yogi

jcchavezs commented 6 years ago

I was more asking from the aws side, if we deliver data same way we receive then it will be easier to start using the agent with different formats :+1:. The no-transformation also opens the possibility for other encodings like msgpack for example.

The only transformation I can think of is the joining of jsons to report a batch over http. Let say you receive: [{"key":"value1", ...}, {"key":"value2", ...}] and in a second moment [{"key":"value3", ...}, {"key":"value4", ...}], you will most likely join them together and send to the server as [{"key":"value1", ...}, {"key":"value2", ...}, {"key":"value3", ...}, {"key":"value4", ...}] but that sort of combination could be left to the Service implementation.

Other sort of transformation could be dropping segments or traces based on different criterias (for example on firehose mode), let's say you only want to send traces with an error or with longer durations than a certain number, again that should be done in Service implementation.

yogiraj07 commented 6 years ago

Hi @jcchavezs ,

Can you please help me understand the following statements: 1)"Using other encoding formats like msgpack" Do you mean, a possible transformation of received segments to msgpack format by the implementer of Service interface. For example, the following flow :

Received segment -> service interface -> transform to msgpack format -> send to desired backend (other than X-Ray service)

2)"Using agent with different formats" Is the input to X-Ray daemon in different format, or the output? Or is this the same concept covered in point 1 or you intended to say something else.

Please let me know if I am missing anything.

And for other mentioned transformations, we can let the implementer of Service interface decide.

Thanks, Yogi

jcchavezs commented 6 years ago

1)"Using other encoding formats like msgpack" Do you mean, a possible transformation of received segments to msgpack format by the implementer of Service interface. For example, the following flow Received segment -> service interface -> transform to msgpack format -> send to desired backend (other than X-Ray service)

Exactly that.

2)"Using agent with different formats"

If the agent might do some sort of validations instead of send what it receives it won't allow users to send data as msgpack or any other format. This is not so important.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in next 7 days. Thank you for your contributions.

NathanielRN commented 2 years ago

Since the OpenTelemetry Collector is better suited for this goal of sending traces to different backends, and it already has the awsxrayreceiver and awsxrayexporter, closing this issue. OpenTelemetry Collector is a good solution for user wanting to export the data to other backends.