Closed Zan-L closed 1 week ago
Can you give a reproducible example / any details about the size of the table / amount of writes? A log report? I wrote the modified code for 0.18.1 and it seems to have fixed other people's problems — I'll take a look at it :)
Hi,
Thank you for the prompt response. Unfortunately, that happened in our enterprise dev environment so I can't provide the proprietary data. I can provide two more observations though:
Understandable if you can't share enterprise data, but even a censored error code/log would be great!
But yeah, the behavior in 0.18.1 changed so that any writes buffer into an in-memory buffer, which flushes when it exceeds the threshold set. The threshold per the config is const DEFAULT_MAX_BUFFER_SIZE: usize = 4 * 1024 * 1024
~ 4 MiB. Can you try tuning it higher by setting max_buffer_size
key the storage_options
dict when loading a table to a higher value?
object_store
documentation says it should be at least 5 MiB
, so this value should probably be tuned on our side anyways.
That is the root cause. I did a test write with 5*2**20 and it worked this time. Can you push another release with this fix?
Btw, the error message from before:
OSError: Generic S3 error: Error performing complete multipart request: Client error with status 400 Bad Request: <Error><Code>EntityTooSmall</Code><Message>Your proposed upload is smaller than the minimum allowed size</Message><ProposedSize>4328982</ProposedSize><MinSizeAllowed>5242880</MinSizeAllowed><PartNumber>1</PartNumber>
I don't control the release cycles, but you can compile from source with that fix! The current version will also work if you set that option.
Environment
Delta-rs version: 0.18.1
Binding: Python
Environment:
Bug
What happened: Same error as https://github.com/delta-io/delta-rs/issues/890 but on S3 directly instead of non-S3
What you expected to happen:
How to reproduce it: Write regular size data to S3 with write_deltalake()
More details: 0.18.0 works fine