delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 364 forks source link

DynamoDB fails commiting transaction log #2651

Open marcosmartinezfco opened 4 days ago

marcosmartinezfco commented 4 days ago

Environment

Delta-rs version: 0.18.2

Binding: Python

Environment:


Bug

What happened:

Encountered errors when trying to write to a Delta Lake table stored in S3 with DynamoDB used for transaction logs. The errors indicate failures to write to DynamoDB for transaction entries.

Error log:

[2024-07-05T13:42:07Z ERROR deltalake_aws::logstore] retry #0 on log entry CommitEntry { version: 1, temp_path: Path { raw: "_delta_log/_commit_9da23947-e5ef-4ed2-a95a-fe3c14f4e66c.json.tmp" }, complete: false, expire_time: None } failed to update lock db: 'Transaction failed: unable to complete entry for '1': failure to write to DynamoDb'
[2024-07-05T13:42:07Z ERROR deltalake_aws::logstore] retry #1 on log entry CommitEntry { version: 1, temp_path: Path { raw: "_delta_log/_commit_9da23947-e5ef-4ed2-a95a-fe3c14f4e66c.json.tmp" }, complete: false, expire_time: None } failed to update lock db: 'Transaction failed: unable to complete entry for '1': failure to write to DynamoDb'
[2024-07-05T13:42:07Z ERROR deltalake_aws::logstore] retry #2 on log entry CommitEntry { version: 1, temp_path: Path { raw: "_delta_log/_commit_9da23947-e5ef-4ed2-a95a-fe3c14f4e66c.json.tmp" }, complete: false, expire_time: None } failed to update lock db: 'Transaction failed: unable to complete entry for '1': failure to write to DynamoDb'
Traceback (most recent call last):
  File "/Users/marcosmartinez/coding/market-updates-module/local.py", line 22, in <module>
    write_deltalake(
  File "/Users/marcosmartinez/coding/market-updates-module/venv/lib/python3.12/site-packages/deltalake/writer.py", line 556, in write_deltalake
    table._table.create_write_transaction(
_internal.CommitFailedError: Transaction failed: dynamodb client failed to delete log entry

What you expected to happen:

Expected the Delta Lake write operation to complete successfully, with transaction entries properly written to DynamoDB.

How to reproduce it:

  1. Ensure AWS credentials and region are configured.
  2. Create a script that writes to a Delta Lake table stored in S3, using DynamoDB for transaction logs.
  3. Run the script locally.

Example script to reproduce the issue:

import daft
from deltalake import write_deltalake
import datetime

if __name__ == "__main__":
    data = {
        "Symbol": ["AAA/EUR"],
        "MDEntryType": ["0"],
        "MDEntryPx": [128.67],
        "MDEntrySize": [5],
        "Timestamp": [datetime.datetime(2024, 7, 3, 18, 45, 36, 123000)],
        "Year": [2024],
        "Month": [7],
        "Day": [3],
        "Hour": [18],
        "Minute": [45],
    }

    dummy_data_df = daft.from_pydict(data)

    write_deltalake(
        "s3://market-updates-snapshots-sandbox/incremental-updates",
        dummy_data_df.to_pandas(),
        mode="append",
        partition_by=["Year", "Month", "Day", "Hour", "Minute", "Symbol"],
    )

    # Read the Delta Lake table
    df = daft.read_deltalake("s3://market-updates-snapshots-sandbox/incremental-updates")

    print(df.show())

More details:

I have confirmed that I have admin permissions on my AWS account, so permissions should not be an issue. This problem persists even with the correct configurations in place. Funny enough the operation append the item to the table so the error is not affecting the write apparently.

rtyler commented 4 days ago

Do you have the script or terraform handy used to create the DynamoDB table? I've not seen this error in use of this code, so I'm curious if tit's possible to configure DynamoDB in a way that triggers this error

marcosmartinezfco commented 3 days ago

@rtyler It's a simple module that we have to create dynamo tables.

module "market_updates_lock_table" {
  source     = "../../utils/dynamodb"
  table_name = "delta_log"

  hash_key = "tablePath"
  attributes = [
    {
      name = "tablePath"
      type = "S"
    }
  ]
}

---
# "../../utils/dynamodb"
resource "aws_dynamodb_table" "this" {
  name      = var.table_name
  hash_key  = var.hash_key
  range_key = var.range_key

  dynamic "attribute" {
    for_each = var.attributes
    content {
      name = attribute.value.name
      type = attribute.value.type
    }
  }

  billing_mode   = var.billing_mode
  read_capacity  = 1
  write_capacity = 1

  dynamic "global_secondary_index" {
    for_each = var.secondary_index != null && var.secondary_index_hash_key != null ? [1] : []
    content {
      name               = var.secondary_index
      hash_key           = var.secondary_index_hash_key
      non_key_attributes = var.secondary_index_non_key_attributes
      projection_type    = "INCLUDE"
      write_capacity     = 1
      read_capacity      = 1
    }
  }
}

And this is the item in the dynamo

Screenshot 2024-07-06 at 12 15 57