Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.61k stars 2.82k forks source link

Tombstone events in Azure Event Hub for Log Compacted Topics #37463

Closed deichrenner closed 3 weeks ago

deichrenner commented 1 month ago

Describe the bug We want to use log compacted Event Hubs (see https://learn.microsoft.com/en-us/azure/event-hubs/use-log-compaction) with the python SDK. To use the full potential of the log compacted topics we need to be able to send tombstone events to indicate the end-of-life for a certain event indicated by its partition_key.

We naively followed the documentation and tried to send an empty event with an existing partition_key like

from azure.eventhub import EventData

tombstone_event = EventData(None)

However, the body of an event cannot be None. Therefore, the next best guess was to use EventData(b''). But this just sends and empty string and does not trigger the tombstone mechanic of the queue.

To Reproduce Steps to reproduce the behavior:

  1. Create a log compacted eventhub with a retention for tombstone events of 1 hour.
  2. Send a message with a null body (as mentioned in the documentation, linked above)

    Example code:

    import asyncio
    import os
    import uuid
    
    import dotenv
    from azure.eventhub import EventData
    from azure.eventhub.aio import EventHubProducerClient
    from azure.identity.aio import DefaultAzureCredential
    
    dotenv.load_dotenv()
    
    credential = DefaultAzureCredential()
    
    async def run():
        producer = EventHubProducerClient.from_connection_string(
            os.getenv("EVENT_HUB_CONNECTION_STRING"),
            eventhub_name=os.getenv("EVENT_HUB_NAME"),
            credential=credential,
        )
        async with producer:
            user_id = str(uuid.uuid4())
            event_data_batch = await producer.create_batch(partition_key=user_id)
            event_data_batch.add(EventData(b''))
            await producer.send_batch(event_data_batch)
    
            await credential.close()
    
    if __name__ == "__main__":
        asyncio.run(run())

Expected behavior After one hour, I would have expected the topic to be empty. However, the message with the empty body still remains in the topic.

Question How to create an Event with a null body?

github-actions[bot] commented 1 month ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

kashifkhan commented 1 month ago

Thank you for the feedback @deichrenner . We will investigate and get back to you asap.

kashifkhan commented 1 month ago

@deichrenner I have reached out to the service team to see exactly what is happening and where this issue resides. Will update as soon as I have something.

deichrenner commented 1 month ago

Hi @kashifkhan, any news on this one?

kashifkhan commented 1 month ago

@deichrenner sorry about the delay on this one. I'm still waiting to hear back from the service team on what the issue is and how do we go about resolving this. Its still on my radar :)

kashifkhan commented 4 weeks ago

@deichrenner I just heard back from the service team and they have confirmed that tombstone events are not supported via the AMQP endpoint (which is what the EH SDK uses). Log compaction (which is a setting) will work automatically.

The service team has taken an action to update and align the docs online to make it clear whats supported & possible today.

github-actions[bot] commented 4 weeks ago

Hi @deichrenner. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

SebastianSchroeder commented 4 weeks ago

@kashifkhan This means that we cannot use the event hub python SDK to send tombstones? Do you have a suggested workaround?

deichrenner commented 3 weeks ago

/unresolve

kashifkhan commented 3 weeks ago

@SebastianSchroeder @deichrenner yes, you cant use the python send to send tombstones ( nor any other language ) as the service doesn't support tombstones via the AMQP endpoint. The documentation suggesting that tombstones are supported is wrong and the service team will update that to reflect the current status. Only log compaction is supported and there are no workaround as per the service team.

SebastianSchroeder commented 3 weeks ago

Alright, thank you for the clarification. We might try the kafka client and endpoint for our log compacted topics. For now we can live with empty messages not being deleted from the topic.

kashifkhan commented 3 weeks ago

@SebastianSchroeder sorry about the inconvenience this caused for yall, I'm hoping that the service team updates the docs soon so that it reflects the current state of the log compaction feature along with limits/what works etc.

SchulteMarkus commented 1 week ago

@SebastianSchroeder @deichrenner yes, you cant use the python send to send tombstones ( nor any other language ) as the service doesn't support tombstones via the AMQP endpoint. The documentation suggesting that tombstones are supported is wrong and the service team will update that to reflect the current status. Only log compaction is supported and there are no workaround as per the service team.

Dude, seriously? Instead of documenting a feature gap, how about make it work?