dacort / athena-federation-python-sdk

Unofficial Python SDK for Athena Federation
Apache License 2.0
16 stars 12 forks source link

Fix deprecated read_schema in new pyarrow versions #13

Open barhot opened 1 year ago

dacort commented 1 year ago

Hi @barhot, thanks for the contribution!

Does this change relate to a specific pyarrow version? As of now, the project is pinned to the (ridiculously old) 0.16.0 version.

barhot commented 1 year ago

According to the documentation, the serialization functionality is deprecated in pyarrow 2.0. The changes have been tested in pyarrow==10.0.1, which is currently considered as a stable version.

dacort commented 1 year ago

Ah ok cool. Can you bump the version in the setup.cfg file as well? How did you test the changes?

barhot commented 1 year ago

Here it is how the changes were tested:

  1. Create AWS Lambda with the lambda_handler from example/handler.py
  2. Create AWS Athena custom data source based on the Lambda.
  3. Show the databases from the data source in Athena console: just one database sampledb is expected.
  4. Show the tables of sampledb in Athena console: just one table demo is expected.
  5. Show the columns of demo table in Athena console: id varchar and name varchar are expected.
  6. Select 10 rows from the table by running the following SQL in Athena:
    select * from "sampledb"."demo" limit 10;

    Here is the expected result:

    image
dacort commented 1 year ago

Awesome, thank you so much @barhot! I'll try to take a look at this in the next few days and do a quick validation pass on my end as well. :)