Command failed with exit code 10 on Glue Job

aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Apache License 2.0

3.9k stars 691 forks source link

Hi everyone

I programmed a processing of data on Jupyter Notebook (SageMaker) with the awswrangler library. This code work perfectly in this enviorement but when I try run it on Glue, the code finish with the next error: Command Failed with exit code 10. This error in the Knowledge Center say that is an error by Memory. Then I runed a memory profile to check how many memory use the process and I find that the process use 25Gb of memory in a "pandas.merge" because the Dataframes are so big (more than 10 Gb each one). Next, I tryed create "categories" on the some columns for optimize the memory use, but when the code execute the "merge" again, this categories was lose. ¿How can I improve this? Is better change all for a Spark Job (Programmed in Spark)? I think that someone must haved this problem and could resolved it.

Please I need guidance. Thanks You.

aws / aws-sdk-pandas

Command failed with exit code 10 on Glue Job #1176