Describe the bug
Working with a struct field inside of a udf fails unless all the struct fields are of type string
To Reproduce
import pandas as pd
import pyarrow.compute as pc
import toolz
from datafusion import (
SessionContext,
column,
functions as f,
udf,
)
def make_df(n=30):
return pd.DataFrame(
{
"a": pd.date_range(start="2020-01-01", freq="M", periods=n),
"b": range(n),
"c": pd.Series(range(n)).add(0.1),
"d": pd.Series(range(n)).astype(str),
}
)
# ).astype(str)
# if all struct fields are str type, the failure does not occur
field_name = "c0"
col_name = "bcd"
ctx = SessionContext()
t = ctx.from_pandas(make_df(), "t").select(
column("a"),
f.functions.struct(*(column(c) for c in col_name)).alias(col_name),
)
my_udf = udf(
toolz.curry(pc.struct_field, indices=field_name),
input_types=[t.schema().field(col_name).type],
return_type=t.schema().field(col_name).type.field(field_name).type,
volatility="volatile",
name="extract_field",
)
ctx.register_udf(my_udf)
t.select(my_udf(column(col_name)))
"""
Exception: type_coercion
caused by
Error during planning: Coercion from [Struct([Field { name: "c0", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "c1", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "c2", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])] to the signature Exact([Struct([Field { name: "c0", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "c1", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "c2", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])]) failed.
"""
Expected behavior
I would expect no failure to occur, as is the case if you first cast all the data to type str
Additional context
Maybe related to #541
The reason I'm trying to pack multiple columns into a single struct column is so that I can simulate running a udaf that accepts multiple columns, which does not currently seem possible
Describe the bug Working with a struct field inside of a
udf
fails unless all the struct fields are of typestring
To Reproduce
Expected behavior I would expect no failure to occur, as is the case if you first cast all the data to type str
Additional context Maybe related to #541 The reason I'm trying to pack multiple columns into a single struct column is so that I can simulate running a
udaf
that accepts multiple columns, which does not currently seem possible