apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.42k stars 14.11k forks source link

Add ordering key in GCP PubSub operator PubSubPublishMessageOperator #39940

Closed mehdigati closed 3 months ago

mehdigati commented 3 months ago

Description

Currently, the PubSubPublishMessageOperator in the Google Cloud Platform (GCP) integration for Apache Airflow does not allow specifying an ordering key for published messages. The ordering key is a feature provided by Cloud Pub/Sub that allows messages to be delivered in the order they were published within a single ordering key.

Use case/motivation

There are several use cases where maintaining the order of published messages is crucial, such as:

  1. Event processing pipelines: When processing a sequence of events, it's essential to maintain the order in which they occurred to ensure proper state management and avoid data corruption.
  2. Distributed transactions: In scenarios involving distributed transactions across multiple services, preserving the order of messages can be critical for maintaining data consistency.
  3. Deduplication: When implementing deduplication mechanisms, the ordering of messages can be a key factor in determining which messages should be considered duplicates.

By adding support for ordering keys in the PubSubPublishMessageOperator, Airflow users would be able to leverage this Cloud Pub/Sub feature and ensure that their published messages are delivered in the desired order, enabling more robust and reliable data processing pipelines.

Related issues

None that I'm aware of.

Are you willing to submit a PR?

Code of Conduct

boring-cyborg[bot] commented 3 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.