Open dadepo opened 1 year ago
@dadepo are you trying to show the result? reg to the stacktrace it failed on console output , not on the computation itself
14: std::io::stdio::_print
at /rustc/c18a5e8a5b1afb0d7a582fe9ebad4c1996c90da3/library/std/src/io/stdio.rs:1074:5
15: arrow_cast::pretty::print_batches
at /Users/dadepo/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-cast-38.0.0/src/pretty.rs:62:5
16: datafusion::dataframe::DataFrame::show::{{closure}}
at /Users/dadepo/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-24.0.0/src/dataframe.rs:682:12
This may be related to the fact that datafusion-cli is trying to write the entire result to an inmemory buffer 🤔
This may be related to the fact that datafusion-cli is trying to write the entire result to an inmemory buffer 🤔
That gives the idea to show only first N rows and make a param on that.
@dadepo are you trying to show the result? reg to the stacktrace it failed on console output , not on the computation itself
@comphead I just checked again, and yeah it blows up with the df.show().await?
call.
df.collect().await?
on the other hand, does not give the error
Isn't it because comfy-table uses u16
?
Your error says,
thread 'main' panicked at 'attempt to add with overflow', /Users/dadepo/.cargo/registry/src/github.com-1ecc6299db9ec823/comfy-table-6.1.4/src/utils/mod.rs:44:9
stack backtrace:
0: rust_begin_unwind
at /rustc/c18a5e8a5b1afb0d7a582fe9ebad4c1996c90da3/library/std/src/panicking.rs:575:5
and on the comfy-table, you can find https://github.com/Nukesor/comfy-table/blob/main/src/utils/mod.rs
pub fn width(&self) -> u16 {
self.content_width + self.padding.0 + self.padding.1
}
I explored a way to fix this from arrow-datafusion
. But relevant data is not exposed to the outside of the crate.
Is having arrow-datafusion
truncate the width, to a max value, that does not overflow an acceptable fix? Is that easily doable? If so, I'll say that is better than blowing up as it currently does.
I think the best way to do is to handle it on comfy-table
's side.
They are widely adopted and I'm pretty sure someone would end up with the same issue.
If we are going to do it on datafusion's side, few things concerns me.
I'm not one of the maintainer so I appreciate advice.
I think this is likely something that needs to be fixed in arrow-rs or comfy-table.
I agree with the tradeoffs identified by @thomas-k-cameron but don't know enough about comfy-table and its maintenance status to know how easy it is to fix or how likely they would be to accept such a change.
Thus, my suggestion is:
Depending on the response you get from those two crates may help decide which approach to take.
That gives the idea to show only first N rows and make a param on that.
This is a neat idea @comphead -- what I recall psql
doing is buffering the first 1000 rows or something and using that sample to determine column widths for the rest of the output, which is streamed.
I have always wanted to do something similar for datafusion-cli (and IOx) but I haven't gotten around to doing so
I created a ticket on comfy-table. They are pretty active so I should be able to hear from them.
This may be related to the fact that datafusion-cli is trying to write the entire result to an inmemory buffer 🤔
That gives the idea to show only first N rows and make a param on that.
This doesn't work because the overflow happens on calculating width
. I think you can restrict the number letters you pass to each row though.
Hey :)
In comfy-table, I went with the approach to choose a u16
for table widths, since this covered all use-cases I could come up with. This allows the user to build tables that're 65535
characters wide.
If there's some content that's longer than 65535
chars, the table width is internally fixed to u16::MAX
, but the formatting can look a bit wonky, since the actual content can be even longer than 65535
chars.
However, I consider this a non-issue, since no user will ever have a screen wide enough to display that many chars and the output on your terminal will be botched anyway.
In case a user wants proper formatting, there's the Dynamic
content adjustment mode, which introduces line-breaks to fit the content to the current/specified terminal width.
I already tried to use the appropriate saturating_[sub|add]
functions everywhere, but it seems I overlooked a few places. A PR https://github.com/Nukesor/comfy-table/pull/114 has been opened, but it would be awesome if somebdoy could test this or even write a regression test case to trigger the panic on the current code-base.
New release with a bunch of fixes has been published :)
https://github.com/Nukesor/comfy-table/releases/tag/v7.0.1 https://crates.io/crates/comfy-table/7.0.1
Describe the bug
Then
After running for about 2 mins, panics
To Reproduce
Create a table with a large number of rows and use array_agg on one of its columns
Expected behavior
array_agg should not panic on large rows.
Additional context