delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.98k stars 365 forks source link

Impossible to append to a DeltaTable with float data type on RHEL #2520

Closed LoicRaillon closed 1 month ago

LoicRaillon commented 1 month ago

Environment

Delta-rs version: 0.17.4

Binding: Python 3.11.5

Environment: OS: RHEL 9.3 or WSL2 (Ubuntu 22.04)


Bug

What happened: I created a DeltaTable where most of the columns have a float type. I tried to append a polars DataFrame to the DeltaTable but I got the following error by setting RUST_BACKTRACE=full

thread '<unnamed>' panicked at python/src/lib.rs:1513:48:
called `Result::unwrap()` on an `Err` value: CDataInterface("The datatype \"Float32\" expects 2 buffers, but requested 2. Please verify that the C data interface is correctly implemented.")
stack backtrace:
   0:     0x7f000d385ba2 - <unknown>
   1:     0x7f000d3b696c - <unknown>
   2:     0x7f000d3824df - <unknown>
   3:     0x7f000d385974 - <unknown>
   4:     0x7f000d38712b - <unknown>
   5:     0x7f000d386e83 - <unknown>
   6:     0x7f000d3875cd - <unknown>
   7:     0x7f000d3874a2 - <unknown>
   8:     0x7f000d386076 - <unknown>
   9:     0x7f000d3871d4 - <unknown>
  10:     0x7f0009fd27d5 - <unknown>
  11:     0x7f0009fd2cd3 - <unknown>
  12:     0x7f000a01d860 - <unknown>
  13:     0x7f000a0c0c02 - <unknown>
  14:     0x7f000a1bc5d6 - <unknown>
  15:     0x7f000a1bd5f3 - <unknown>
  16:     0x7f000a18fbfa - <unknown>
  17:     0x7f000a1bc781 - <unknown>
  18:     0x7f00151d44a2 - <unknown>
  19:     0x7f00151b50c6 - _PyObject_MakeTpCall
  20:     0x7f00151bdd26 - _PyEval_EvalFrameDefault
  21:     0x7f00151b9c52 - <unknown>
  22:     0x7f00151e5c37 - <unknown>
  23:     0x7f00151c2236 - _PyEval_EvalFrameDefault
  24:     0x7f00151b9c52 - <unknown>
  25:     0x7f0015245ff6 - PyEval_EvalCode
  26:     0x7f0015264994 - <unknown>
  27:     0x7f0015260e16 - <unknown>
  28:     0x7f001519e172 - <unknown>
  29:     0x7f001519e31b - _PyRun_InteractiveLoopObject
  30:     0x7f00151282cd - <unknown>
  31:     0x7f001519e4a4 - PyRun_AnyFileExFlags
  32:     0x7f0015123e4a - <unknown>
  33:     0x7f0015233bad - Py_BytesMain
  34:     0x7f0014c29590 - __libc_start_call_main
  35:     0x7f0014c29640 - __libc_start_main_alias_1
  36:     0x563746731095 - _start
  37:                0x0 - <unknown>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/airflow/.cache/pypoetry/virtualenvs/dataflow-tK0vRoRm-py3.11/lib/python3.11/site-packages/datadex/table.py", line 145, in append
    data.write_delta(
  File "/home/airflow/.cache/pypoetry/virtualenvs/dataflow-tK0vRoRm-py3.11/lib64/python3.11/site-packages/polars/dataframe/frame.py", line 3589, in write_delta
    write_deltalake(
  File "/home/airflow/.cache/pypoetry/virtualenvs/dataflow-tK0vRoRm-py3.11/lib64/python3.11/site-packages/deltalake/writer.py", line 319, in write_deltalake
    write_deltalake_rust(
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: CDataInterface("The datatype \"Float32\" expects 2 buffers, but requested 2. Please verify that the C data interface is correctly implemented.")

What you expected to happen: I can append to a DeltaTable which is previously created.

rtyler commented 1 month ago

@LoicRaillon can you please share the command used to create the table, at least enough of the schema to include the float definition?

LoicRaillon commented 1 month ago

I found the error. When a Delta Table is first created with DeltaTable.create, the appended data must respect the column ordering of the created DeltaTable schema.