There should be a way to pass the CatalogId as there will be users that will need to pass the CatalogId.
This happened to my use case at work.
How to reproduce
Try to target a Glue database and table that has an associated CatalogId where the CatalogId is not the default AWS AccountId and all operations will fail.
Anything else
I was able to have a workaround by copying the implementation of the actual GlueCatalogHook and changing our sensors to use this ExtendedGlueCatalogHook where we add the CatalogId to the calls, example:
def get_partitions(
self,
catalog_id: str,
database_name: str,
table_name: str,
expression: str = "",
page_size: int | None = None,
max_items: int | None = None,
) -> set[tuple]:
...
response = paginator.paginate(
CatalogId=catalog_id, <=============== This should be added as an optional parameter
DatabaseName=database_name, TableName=table_name, Expression=expression, PaginationConfig=config
)
partitions = set()
for page in response:
for partition in page["Partitions"]:
partitions.add(tuple(partition["Values"]))
return partitions
...
If anyone from the AWS team is going to work on this one, I'm also part of Amazon and you reach reach me (keds@) and I can show you what we did on this one.
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon
Apache Airflow version
2.10.1
Operating System
MWAA
Deployment
Amazon (AWS) MWAA
Deployment details
Vanilla Deployment
What happened
The current GlueCatalogHook doesn't pass the CatalogId property during boto3 calls as seen from here:
GlueCatalogHook
What you think should happen instead
There should be a way to pass the CatalogId as there will be users that will need to pass the CatalogId.
How to reproduce
Try to target a Glue database and table that has an associated CatalogId where the CatalogId is not the default AWS AccountId and all operations will fail.
Anything else
I was able to have a workaround by copying the implementation of the actual GlueCatalogHook and changing our sensors to use this ExtendedGlueCatalogHook where we add the CatalogId to the calls, example:
If anyone from the AWS team is going to work on this one, I'm also part of Amazon and you reach reach me (keds@) and I can show you what we did on this one.
Thanks!
Are you willing to submit PR?
Code of Conduct