apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.76k stars 14.22k forks source link

AWS GlueCatalogHook doesn't support custom CatalogId #43238

Open keeed opened 4 hours ago

keeed commented 4 hours ago

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon

Apache Airflow version

2.10.1

Operating System

MWAA

Deployment

Amazon (AWS) MWAA

Deployment details

Vanilla Deployment

What happened

The current GlueCatalogHook doesn't pass the CatalogId property during boto3 calls as seen from here:

GlueCatalogHook

What you think should happen instead

There should be a way to pass the CatalogId as there will be users that will need to pass the CatalogId.

How to reproduce

Try to target a Glue database and table that has an associated CatalogId where the CatalogId is not the default AWS AccountId and all operations will fail.

Anything else

I was able to have a workaround by copying the implementation of the actual GlueCatalogHook and changing our sensors to use this ExtendedGlueCatalogHook where we add the CatalogId to the calls, example:

 def get_partitions(
        self,
        catalog_id: str,
        database_name: str,
        table_name: str,
        expression: str = "",
        page_size: int | None = None,
        max_items: int | None = None,
    ) -> set[tuple]:
   ...

   response = paginator.paginate(
            CatalogId=catalog_id, <=============== This should be added as an optional parameter
            DatabaseName=database_name, TableName=table_name, Expression=expression, PaginationConfig=config
        )

        partitions = set()
        for page in response:
            for partition in page["Partitions"]:
                partitions.add(tuple(partition["Values"]))

        return partitions
...

If anyone from the AWS team is going to work on this one, I'm also part of Amazon and you reach reach me (keds@) and I can show you what we did on this one.

Thanks!

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 4 hours ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.