elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.14k stars 4.91k forks source link

[RFC] Standardization of azure-eventhub Input Metadata Field #40561

Open zmoog opened 3 weeks ago

zmoog commented 3 weeks ago

Abstract

Provide a brief summary of the RFC's purpose.

The goal of the RFC is to standardize the name and the content of the azure-eventhub field.

Introduction

Explain the background, context, and motivation for the proposal.

Since its inception five years ago, the azure-eventhub input stored the "event hub metadata" (event hub name, consumer group, offset, and more with the input v2) in the azure field of type object.

However, since many integrations use the azure field as the root element for their specific fields (i.e. azure.activitylogs, etc), these integrations usually rename the azure field with the metadata as azure-eventhub to keep the metadata alongside the actual data.

Here is an example:

{
    "azure-eventhub": {
      "sequence_number": 21916518,
      "partition_id": "1",
      "consumer_group": "$Default",
      "offset": 9955743838336,
      "eventhub": "mbranca815",
      "enqueued_time": "2024-08-20T09:10:01.486Z"
    }
}

Here are a few integrations that rename azure field with metadata into azure-eventhub:

And others who do not rename the field:

The older integrations perform the rename azure > azure-eventhub, but the more recent integrations do not.

There are at least two practical problems here:

Proposal

Detail the proposed changes, including technical specifications, diagrams, and examples if necessary.

I suggest:

  1. Adopting the current defacto standard name azure-eventhub as the official metadata field name.
  2. Documenting all the existing field content.
  3. Change the input to store the metadata in the azure-eventhub field.
  4. Change the input to make the azure-eventhub field optional to save storage, if required (default enabled).
  5. Make sure all existing integrations work with azure-eventhub field.

Existing field content

The metadata field contains the following information.

Field Description Notes
azure-eventhub.eventhub Event hub name
azure-eventhub.consumer_group Name of the consumer group
azure-eventhub.enqueued_time Timestamp of the time the message was published on the event hub
azure-eventhub.offset Message offset in the event hub partition
azure-eventhub.sequence_number Message sequence number in the event hub partition
azure-eventhub.partition_id The partition ID of the message since v2
azure-eventhub.partition_key The partition key of the message since v2 (optional)

Rationale

Justify the proposal by discussing the problem it solves and why this solution is chosen over alternatives.

Name

If I could go back in time when the input was created, with today's experience I would call this field something like azure_eventhub_metadata. However, the azure-eventhub is good enough to represent the semantics.

Changing the field name would cause a breaking change that doesn't feel worth it, given the secondary role of the metadata field from the users' perspective.

Impact

Describe the expected impact on users, systems, and any potential side effects.

Since all integrations will use azure-eventhub field, we expect a reduction in mapping conflicts from

the azure field.

Security Considerations

Address any security implications of the proposal.

No security implications so far.

Backward Compatibility

Explain any effects on existing systems or versions.

We need to double-check if the rename processor in the existing integrations works correctly when there is no azure field in the message.

Implementation

Outline the steps needed for implementation, including timelines, milestones, and responsible parties.

### Tasks
- [ ] Update the input to store metadata in the `azure-eventhub` field
- [ ] Make the `azure-eventhub` field optional
- [ ] Add a rename processor to integrations not using `azure-eventhub` field yet
- [ ] Write a .md document or section that document the existing metadata content 

Conclusion

Summarize the key points and restate the importance of the proposal.

Key Points Summary

Importance of the Proposal

References

List any external references or documents cited in the RFC.

elasticmachine commented 3 weeks ago

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)