I'm doing some initial tests on trying to stream parquets from S3, but something is going wrong in the step where Polars try to cast the parquet's columns dtypes.
** (MatchError) no match of right hand side value: {:error, "Generic Error: cannot cast to dtype: i32"}
(explorer 0.7.2) lib/explorer/polars_backend/shared.ex:107: Explorer.PolarsBackend.Shared.df_dtypes/1
(explorer 0.7.2) lib/explorer/polars_backend/shared.ex:87: Explorer.PolarsBackend.Shared.create_dataframe/1
(explorer 0.7.2) lib/explorer/polars_backend/lazy_frame.ex:163: Explorer.PolarsBackend.LazyFrame.from_parquet/3
iex:6: (file)
The content of the column in question is a simple int32 timestamp (which I've double checked and there isn't any artifact, string, empty or any bad datum likewise):
I could confirm that upon downloading this same parquet and trying to load it directly from local file system, it worked flawlessly; so it ought to be something specific for lazyframes:
Explorer.DataFrame.from_parquet("./id-part-0.parquet")
{:ok, #Explorer.DataFrame< Polars[37131 x 16] ... >}
1.1 Indeed, if I just add the lazy: true option, it gives me the error again:
Explorer.DataFrame.from_parquet("./id-part-0.parquet", lazy: true)
** (MatchError) no match of right hand side value: {:error, "Generic Error: cannot cast to dtype: i32"}
(explorer 0.7.2) lib/explorer/polars_backend/shared.ex:107: Explorer.PolarsBackend.Shared.df_dtypes/1
(explorer 0.7.2) lib/explorer/polars_backend/shared.ex:87: Explorer.PolarsBackend.Shared.create_dataframe/1
(explorer 0.7.2) lib/explorer/polars_backend/lazy_frame.ex:171: Explorer.PolarsBackend.LazyFrame.from_parquet/3
iex:1: (file)
Following the stack trace and dbging inside Explorer.PolarsBackend.Shared.df_dtypes/1 we can see that LazyFrame was already available:
Hey guys!
I'm doing some initial tests on trying to stream parquets from S3, but something is going wrong in the step where Polars try to cast the parquet's columns dtypes.
The call was pretty straight-forward:
But I'm getting this error:
The content of the column in question is a simple int32 timestamp (which I've double checked and there isn't any artifact, string, empty or any bad datum likewise):
Some important information:
1.1 Indeed, if I just add the
lazy: true
option, it gives me the error again:dbg
ing insideExplorer.PolarsBackend.Shared.df_dtypes/1
we can see that LazyFrame was already available:So the following step is the root of all evil... 😢
I've even tried to take a look into its rust src but, skill issues aside (🤣 ); I can't identify anything obvious that could be failing:
Anything we could do about it? Thanks!