Open pietrodantuono opened 11 months ago
I have the same issue.
Maybe to add to this:
When using the @udf
decorator, or wrapping the str_hex_to_numeric
function it works for me!
@udf
def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float:
...
or
def udf_wrapper():
def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float:
...
return udf(str_hex_to_numeric, FloatType())
What also doesn't work is referencing things from outside the functions scope, constants for example.
I have the same issue. I found it with this use case:
df = df.withColumn('result', my_udf(col('some_data')))
where my_udf is in a helper module.
The only solution I've found to this point is to package up the helper in a wheel and install the wheel on the cluster. And then run my notebook from the databricks workspace rather than vscode.
I have the same issue.
Maybe to add to this: When using the
@udf
decorator, or wrapping thestr_hex_to_numeric
function it works for me!@udf def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float: ...
or
def udf_wrapper(): def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float: ... return udf(str_hex_to_numeric, FloatType())
What also doesn't work is referencing things from outside the functions scope, constants for example.
Had the same issue with "databricks-connect==15.3.0"
. With the UDF decorator it indeed works! Thanks!
Full resolution code sample with the decorator:
# helper_module.py
# From the Python Standard Library
import struct
# From PySpark
import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.sql import DataFrame
from pyspark.sql.functions import udf
@udf(T.FloatType())
def str_hex_to_numeric(
hex_value: str,
data_type_name: str
) -> float:
"""Convert a hex string to a numeric value."""
if data_type_name == "Float":
return struct.unpack('!f', bytes.fromhex(hex_value))[0]
raise ValueError(f"Unknown data type: {data_type_name}")
def value_col_hex_to_numeric(
df: DataFrame,
value_col: str = "VALUE",
data_type_name_col: str = "DATA_TYPE_NAME"
) -> DataFrame:
"""Convert a hex string to a numeric value."""
return df.withColumn(
value_col,
str_hex_to_numeric(F.col(value_col), F.col(data_type_name_col))
)
System information
Code structure
Code sample