Open TheKnightCoder opened 1 month ago
Delta-rs version: 0.16.2 - 0.17.4
Binding: Python
Environment:
What happened: Code is working in version 0.15.3 code snippet:
table_path = f's3://{BUCKET_NAME}/bronze/test' def insert(): print('writting') data = {"id": [1,2], "b": [2, 2]} df = pd.DataFrame(data) print(df) write_deltalake(table_path, df, mode='overwrite', overwrite_schema=True) print('written') insert()
What you expected to happen: Attempting to write_deltalake or any other operation will throw this error:
write_deltalake
[ERROR] OSError: Generic S3 error: Error after 10 retries in 2.612435942s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://s3.us-east-1.amazonaws.com/bucketname-bmdcgdma/bronze/test/_delta_log/_last_checkpoint): error trying to connect: invalid peer certificate: BadSignature Traceback (most recent call last): File "/var/task/events/s3-update-nfts.py", line 127, in handler insert() File "/var/task/events/s3-update-nfts.py", line 35, in insert write_deltalake(table_path, df, mode='overwrite', overwrite_schema=True, storage_options=storage_options) File "/opt/python/deltalake/writer.py", line 265, in write_deltalake table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options) File "/opt/python/deltalake/writer.py", line 688, in try_get_table_and_table_uri table = try_get_deltatable(table_or_uri, storage_options) File "/opt/python/deltalake/writer.py", line 701, in try_get_deltatable return DeltaTable(table_uri, storage_options=storage_options) File "/opt/python/deltalake/table.py", line 405, in __init__ self._table = RawDeltaTable(
How to reproduce it:
Use the AWS SDK for pandas managed layer for dependencies (pyarrow, numpy ect): https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html
Create a custom deltalake lambda layer: requirements.txt
deltalake==0.17.4 pyarrow_hotfix==0.6
build.sh
mkdir -p ./dist/python pip install -r requirements.txt -t ./dist/python --no-deps --platform manylinux2014_aarch64
following this article https://delta.io/blog/2023-04-06-deltalake-aws-lambda-wrangler-pandas/ added pyarrow_hotfix as its a required dependency not available in aws sdk for pandas layer
Add all s3 permissions to the aws lambda execution layer
set env var AWS_S3_ALLOW_UNSAFE_RENAME: 'true'
Use the layer to write delta table in aws lambda throw errors
More details: Last working version 0.15.3 but not working from 0.16.2 - 0.17.4
Edit: 0.16.1 is working too, bug introduced int 0.16.2
@TheKnightCoder can you check against 0.18.1 please?
Environment
Delta-rs version: 0.16.2 - 0.17.4
Binding: Python
Environment:
Bug
What happened: Code is working in version 0.15.3 code snippet:
What you expected to happen: Attempting to
write_deltalake
or any other operation will throw this error:How to reproduce it:
Use the AWS SDK for pandas managed layer for dependencies (pyarrow, numpy ect): https://aws-sdk-pandas.readthedocs.io/en/stable/layers.html
Create a custom deltalake lambda layer: requirements.txt
build.sh
following this article https://delta.io/blog/2023-04-06-deltalake-aws-lambda-wrangler-pandas/ added pyarrow_hotfix as its a required dependency not available in aws sdk for pandas layer
Add all s3 permissions to the aws lambda execution layer
set env var AWS_S3_ALLOW_UNSAFE_RENAME: 'true'
Use the layer to write delta table in aws lambda throw errors
More details: Last working version 0.15.3 but not working from 0.16.2 - 0.17.4
Edit: 0.16.1 is working too, bug introduced int 0.16.2