aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.94k stars 701 forks source link

ImportError: numpy.core.multiarray failed to import #3011

Open LBogolubov1 opened 3 weeks ago

LBogolubov1 commented 3 weeks ago

Describe the bug

Our Glue jobs suddenly started to fail with ImportError: numpy.core.multiarray failed to import

How to Reproduce

Just use pandas

Expected behavior

Numpy should be imported properly

Your project

No response

Screenshots

image (3)

OS

Linux

Python version

3.10

AWS SDK for pandas version

3.10.0

Additional context

No response

jaidisido commented 3 weeks ago

Release 3.10.0 of the library enabled support for numpy 2.0. Sadly, AWS Glue does not currently support numpy 2+. Two options are available to you in your Glue job configuration:

  1. Install a lower version of awswrangler (3.9.1 or lower)
  2. Install numpy 1.26.1<2.x with awswrangler 3.10+:
    --additional-python-modules: numpy==1.26.1,awswrangler==3.10.0
worksofindustry commented 3 weeks ago

Having the exact same issue the fix is --additional-python-modules'] = redshift_connector,awswrangler==3.9.1'

Also downgrading to Glue 3.0 can resolve the issue, but you'll be losing the advantages of Glue 4.0.