delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.15k stars 381 forks source link

Exception: Json error: whilst decoding field 'minValues' [Polars Python Delta-RustEngine] #2800

Open starzar opened 3 weeks ago

starzar commented 3 weeks ago

Environment

Windows_10(localmachine), Python_3.12.3 ,Polars_1.3.0

Delta-rs version: deltalake _ 0.18.2

Binding:

Environment:


Bug

CSV - Fii_OiMultiIndex.csv

The thousand separators for "1,303,517" are already removed in csv and is working fine with polars operations. But on writing it to a deltalake file(rust engine) suddenly there is an error Json error: whilst decoding field 'minValues' .

                  ^^^^^^^^^^^^^^
Exception: Json err

or: whilst decoding field 'minValues': whilst decoding field 'OpenInterest(contracts)': failed to parse "1,303,517" as Int64

What happened:

Traceback (most recent call last):
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\secondTerminal.py", line 250, in <module>
    oifiiindices_df_cur.write_delta(target=oifiiindices_dbpath, mode="overwrite", delta_write_options=delta_write_options)
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\polars\dataframe\frame.py", line 4077, in write_delta
    write_deltalake(
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\deltalake\writer.py", line 258, in write_deltalake
    table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\deltalake\writer.py", line 673, in try_get_table_and_table_uri
    table = try_get_deltatable(table_or_uri, storage_options)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\deltalake\writer.py", line 686, in try_get_deltatable
    return DeltaTable(table_uri, storage_options=storage_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\deltalake\table.py", line 297, in __init__
    self._table = RawDeltaTable(
                  ^^^^^^^^^^^^^^
Exception: Json error: whilst decoding field 'minValues': whilst decoding field 'OpenInterest(contracts)': failed to parse "1,303,517" as Int64

What you expected to happen: delta_sep

How to reproduce it:

import polars as pl

oifiiindices_appendpath_original = "Fii_OiMultiIndex.csv"

oifiiindices_dbpath = "oiFii_MultiIndex.delta"

oifiiindices_schema = {
    "Date": pl.Date, "Day": pl.UInt8, "Index": pl.String, "BUY(contracts)": pl.Int32, "BUY(Crores)": pl.Int32,
    "SELL(contracts)": pl.Int32,
    "SELL(Crores)": pl.Int32, "OpenInterest(contracts)": pl.Int32,
    "Amt in Crores": pl.Int32, "Net Contracts Position": pl.Int32, "type": pl.String
}

oifiiindices_df_cur = pl.read_csv(source=oifiiindices_appendpath_original, schema=oifiiindices_schema)

# ComputeError: cannot compare string with numeric type (i32)
# row = oifiiindices_df_cur.filter(pl.col("OpenInterest(contracts)") == "1,303,517")

# Works fine for Int64
row = oifiiindices_df_cur.filter(pl.col("OpenInterest(contracts)") == 1303517)
print('oifiiindices_df_cur')
print(row)

# Exception: Json error: whilst decoding field 'minValues': whilst decoding field 'OpenInterest(contracts)': failed to parse "1,303,517" as Int64
delta_write_options = {
    "engine": "rust",
    "schema_mode": "overwrite",
}

oifiiindices_df_cur.write_delta(target=oifiiindices_dbpath, mode="overwrite", delta_write_options=delta_write_options)

More details:

ion-elgreco commented 3 weeks ago

We are at 0.19.1 already, please try that version since this might already have been fixed in 0.19.0. Important re create the table

starzar commented 3 weeks ago

0.18.2 is the latest version which is installed through pip . How else to upgrade ?

PS C:\Users\User_0\Documents\Code\Python\NseScraping> pip install deltalake
Requirement already satisfied: deltalake in c:\users\user_0\appdata\local\programs\python\python312\lib\site-packages (0.18.2)
Requirement already satisfied: pyarrow>=8 in c:\users\user_0\appdata\local\programs\python\python312\lib\site-packages (from deltalake) (16.0.0)
Requirement already satisfied: pyarrow-hotfix in c:\users\user_0\appdata\local\programs\python\python312\lib\site-packages (from deltalake) (0.6)
Requirement already satisfied: numpy>=1.16.6 in c:\users\user_0\appdata\local\programs\python\python312\lib\site-packages (from pyarrow>=8->deltalake) (1.26.4)

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip
PS C:\Users\User_0\Documents\Code\Python\NseScraping> pip install polars   
Requirement already satisfied: polars in c:\users\user_0\appdata\local\programs\python\python312\lib\site-packages (1.3.0)
sherlockbeard commented 2 weeks ago

you can delete and clear cache for deltalake and download again

Edit : I tried the code with the latest version and its working fine