apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
355 stars 70 forks source link

Unable to Read Parquet Files from S3 Bucket #638

Open dhruvils414 opened 5 months ago

dhruvils414 commented 5 months ago

escription: I'm attempting to read Parquet files from an S3 bucket using DataFusion in Python. Below is the code snippet I'm using:

python Copy code import datafusion from datafusion import SessionContext

s3 = object_store.object_store.AmazonS3("s3://test/", "us-east-2") ctx = SessionContext() ctx.register_object_store("s3", s3) df = ctx.read_parquet("s3://test/00001.parquet") Error Encountered: I'm encountering the following error:

css Copy code dataFusion error: Internal("No suitable object store found for s3://test/00001.parquet") Issue Investigation: I've tried to find relevant documentation or support resources but haven't been successful in locating any.

Resources Reviewed: While researching, I came across the following Rust documentation which appears relevant but unfortunately doesn't have a corresponding Python counterpart:

DataFusion Rust Documentation DataFusion ObjectStore S3 Rust Documentation Request for Assistance: Could someone please guide me on how to resolve this issue in Python? Any assistance or pointers to relevant documentation would be greatly appreciated. Thank you!