aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.84k stars 678 forks source link

Deprecation Warning for `parallelism` Argument in `read_parquet` with `ray_args` #2864

Closed DimitarSirakov closed 3 days ago

DimitarSirakov commented 1 week ago

There's a deprecation warning related to the parallelism argument in Ray 2.10. The warning suggests using override_num_blocks instead. This issue occurs in read_api.py at line 3087.

Steps to Reproduce:

  1. Set up an environment with Ray 2.10.
  2. Use the awswrangler.s3.read_parquet function with ray_args that includes the parallelism parameter.
  3. Observe the warning message.

Code Example:

import awswrangler as wr

# Example usage of read_parquet with ray_args including parallelism
df = wr.s3.read_parquet(
    path="s3://bucket/path/",
    ray_args={"parallelism": 10}
)

Expected Behavior: The read_parquet function should accept the appropriate parameter for specifying parallelism without generating a deprecation warning.

Suggested Fix: Update the read_parquet function to use override_num_blocks instead of parallelism when passing ray_args.