Closed lkarthee closed 4 months ago
There is a good news and a bad news. The good news is that we have only one failing test! 🎉 🥳 But the bad news is that we may need to submit changes to Polars in order to fix the failing test.
In a nutshell, the problem is on the LazyFrame#sink_parquet_cloud
function from Polars that is failing probably because of the reasons I wrote in this issue: https://github.com/pola-rs/polars/issues/17172#issuecomment-2243328412
We can either wait for the fix to land on Polars, or deactivate the feature until there. This is going to affect people using the :streaming
option of to_parquet/2
(cloud version). WDYT @josevalim @lkarthee @billylanchantin?
EDIT: I'm going to play with an idea to use the sink_parquet
and another process to read the file, that can be a fifo/named pipe, and write to the cloud storage. I will let you know by the end of the day.
What do we for when streaming is not supported? We raise or we just do the best we can? I am worried Polars is not ultimately interested in maintaining these features. We have another related PR that is open for quite some time.
What do we for when streaming is not supported? We raise or we just do the best we can?
I think we can raise if it's not possible to support a given feature, but we can try to implement "hacks" when possible, until we need to remove the feature all together. Unfortunately we can't have our patches (via monkey patches) to the Polars codebase, and I don't think it's a good idea to maintain a fork to solve our issues. WDYT?
I am worried Polars is not ultimately interested in maintaining these features. We have another related PR that is open for quite some time.
Yeah, I'm worried too. Maybe it's not their focus to maintain this cloud integration in their Rust codebase.
Yeah, let's raise for now and add a note to the CHANGELOG. Meanwhile, if you could send a PR to them, it would be great!
@josevalim OK, I will add a note to the change log later. My PoC didn't work out, but I was wondering if we could still "enable streaming", but first stream to a temporary file, and them read from that file and send to the cloud. This could keep the convenience of saving to S3 right from Explorer, with the downside of requiring some space in disk, and probably extra time. WDYT?
Meanwhile, if you could send a PR to them, it would be great!
Yeah, I will try that! Something similar to our CloudWriter
may work.
Let’s just disable for now. People can stay in the old version until Polars is updated with a fix.
+1 for disabling and we should document it in the changelog.
We are almost there! But a problem regarding the cloud integration is still blocking this. I'm trying to fix it by updating
object_store
and our customCloudWriter
(Rust side), but I suspect that the problem is more complex due to some Polars' changes. I will keep trying :)