Closed nelyanne-v closed 1 month ago
I already have a Lambda layer dedicated for awsdatawrangler 3.9.0
Just to clarify, does this mean you have created your own layer or are you using the one we publish?
The only pointer that springs to mind for me is to increase the memory of the Lambda function. Newer versions require more memory due to their larger dependencies size
Sorry it was not clear, I've been using the published layer.
I managed to fix the issue by doing these 2 things, maybe it will help someone:
session = boto3.session.Session()
def handler(event, context): ...
* I'm using it as a value for `boto3_session` parameter
wr.catalog.add_parquet_partitions(database=event['database'], table=table['Name'], partitions_values=partitions, boto3_session=session)
Describe the bug
Hi all,
I have an AWS Lambda function that is calling
add_parquet_partitions()
function to add new partitions to tables in my Glue catalogue on the daily basis. I am currently trying to upgrade Lambda from using Python 3.8 + awsdatawrangler 1.6 to Python 3.12 + awsdatawrangler 3.9.0. I was able to test the upgraded Lambda by using a local invocation, and it worked without issues.However, after I deployed the Lambda on AWS, it always gets stuck on adding a partition for a table. Normally, it was taking ~70s to process all my tables, but now even 15mins is not enough. I'm not getting an explicit error message, and I am not able to get it because of the Lambda max running time limit. I observed that this happens when adding a partition to the 3rd table in a loop. It doesn't seem to be an issue with a single particular table. For example, if the first run fails on table 3, and I force the second run to start from table 3, it will initially succeed for tables 3 & 4, but then it will fail on table 5.
I'm calling the function using only the required arguments. I checked the input parameters and there is nothing wrong I can see about the values.
I already have a Lambda layer dedicated for awsdatawrangler 3.9.0, and it's used in my other upgraded Lambdas without issues. Nothing else has changed about my deployment process. The Lambda doesn't have other dependencies (only awsdatawrangler).
Any idea how I can investigate it further? I'd be grateful for any pointers.
How to Reproduce
add_parquet_partitions()
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
AWS Lambda, x86_64
Python version
3.12.4
AWS SDK for pandas version
3.9.0
Additional context
No response