Closed dlovell closed 6 months ago
the underlying issue is probably https://github.com/apache/arrow-datafusion/issues/8118
I was having a similar issue with version 31.0.0. Small example snip:
import datafusion as dfu
ctx = dfu.SessionContext()
ctx.register_csv('delta', 'test.csv')
result = ctx.sql('SELECT col1, COUNT(DISTINCT col2) FROM delta GROUP BY col1')
This assigns result to a datafusion.Dataframe
object as expected. I can see it has the correct values in it by printing it out in terminal/jupyter. However, if I do result.to_polars()
(or to_anything else) I get the same error as the original post.
ArrowInvalid: Schema at index 0 was different:
col1: int64
COUNT(DISTINCT delta.col2): int64
vs
delta.col1: int64
COUNT(DISTINCT delta.col2): int64
However it works in version 33.0,0 (I think that's the current version), so I assume there was a fix.
This is fixed in 34.0.0
Describe the bug calling
to_*
on a dataframe with astruct
column fails unless all the struct fields are of typestring
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect no failure to occur, as is the case if you first cast all the data to type
str