Delta-rs version:
0.21.0
0.20.0
0.19.0
I can't test with 0.18.0
In 0.17.0 it works fine
Binding:
Python
Environment:
Cloud provider:
Local and S3
OS:
MacOS and Amazon Linux
Other:
Bug
What happened:
When overwriting a table all the schema gets rewritten (already reported here https://github.com/delta-io/delta-rs/pull/2923) AND I think because of how json metadata is encoded/decoded, all \ characters get escaped again (these characters come from Spark comments/metadata for example, or my own comments)
One of my "development" tables json files grew to 350mb, now delta can't scan them anymore (thrift buffer size limits :) )
What you expected to happen:
When rewriting metadata, no extra escape characters should be added again
Environment
Delta-rs version: 0.21.0 0.20.0 0.19.0 I can't test with 0.18.0 In 0.17.0 it works fine
Binding: Python
Environment:
Bug
What happened: When overwriting a table all the schema gets rewritten (already reported here https://github.com/delta-io/delta-rs/pull/2923) AND I think because of how json metadata is encoded/decoded, all \ characters get escaped again (these characters come from Spark comments/metadata for example, or my own comments)
One of my "development" tables json files grew to 350mb, now delta can't scan them anymore (thrift buffer size limits :) )
What you expected to happen:
When rewriting metadata, no extra escape characters should be added again
How to reproduce it:
I'm sorry but I can only test with polars :(
https://docs.pola.rs/api/python/stable/reference/api/polars.DataFrame.write_delta.html
More details: test_table.zip contains the delta table with active+id columns, empty. test_table_broken.zip contains the tables with many \\\
Image with cat 00008.json and 0000.json, see how the \\ grew
test_table_broken.zip test_table.zip