feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.4k stars 961 forks source link

OnDemandFeatureView.feature_transformation.infer_features does pass UDF outputs to python_type_to_feast_value_type #4308

Closed alexmirrington closed 2 weeks ago

alexmirrington commented 1 month ago

Expected Behavior

OnDemandFeatureView.feature_transformation.infer_features should be able to infer features from primitive python types for all supported feast data types, for all transformation backends.

Current Behavior

All on demand feature views are currently broken for list types, as there is no way to bypass schema inference.

Details

OnDemandFeatureView.feature_transformation.infer_features can only infer features in the type map inside python_type_to_feast_value_type, i.e.

type_map = {
    "int": ValueType.INT64,
    "str": ValueType.STRING,
    "string": ValueType.STRING,  # pandas.StringDtype
    "float": ValueType.DOUBLE,
    "bytes": ValueType.BYTES,
    "float64": ValueType.DOUBLE,
    "float32": ValueType.FLOAT,
    "int64": ValueType.INT64,
    "uint64": ValueType.INT64,
    "int32": ValueType.INT32,
    "uint32": ValueType.INT32,
    "int16": ValueType.INT32,
    "uint16": ValueType.INT32,
    "uint8": ValueType.INT32,
    "int8": ValueType.INT32,
    "bool": ValueType.BOOL,
    "boolean": ValueType.BOOL,
    "timedelta": ValueType.UNIX_TIMESTAMP,
    "timestamp": ValueType.UNIX_TIMESTAMP,
    "datetime": ValueType.UNIX_TIMESTAMP,
    "datetime64[ns]": ValueType.UNIX_TIMESTAMP,
    "datetime64[ns, tz]": ValueType.UNIX_TIMESTAMP,
    "category": ValueType.STRING,
}

This is because if the type e.g. ValueType.FLOAT_LIST doesn't have a mapping in the dictionary above, and value is None, then isinstance(value, dtype) checks will fall through to the ValueError in python_type_to_feast_value_type.

Steps to reproduce

Initialize a new repository:

feast init

Modify the sample on_demand_feature_view to return an array of floats instead of just floats, e.g.

diff --git a/true_garfish/feature_repo/example_repo.py b/true_garfish/feature_repo/example_repo.py
index 1f5b946..59d4501 100644
--- a/true_garfish/feature_repo/example_repo.py
+++ b/true_garfish/feature_repo/example_repo.py
@@ -16,7 +16,7 @@ from feast import (
 from feast.feature_logging import LoggingConfig
 from feast.infra.offline_stores.file_source import FileLoggingDestination
 from feast.on_demand_feature_view import on_demand_feature_view
-from feast.types import Float32, Float64, Int64
+from feast.types import Float32, Float64, Int64, Array

 # Define an entity for the driver. You can think of an entity as a primary key used to
 # fetch features.
@@ -72,15 +72,16 @@ input_request = RequestSource(
 @on_demand_feature_view(
     sources=[driver_stats_fv, input_request],
     schema=[
-        Field(name="conv_rate_plus_val1", dtype=Float64),
-        Field(name="conv_rate_plus_val2", dtype=Float64),
+        Field(name="conv_rate_plus_vals", dtype=Array(Float64)),
     ],
 )
 def transformed_conv_rate(inputs: pd.DataFrame) -> pd.DataFrame:
-    df = pd.DataFrame()
-    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
-    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
-    return df
+    result = {"conv_rate_plus_vals": []}
+    for _, row in inputs.iterrows():
+        result["conv_rate_plus_vals"].append(
+            [row["conv_rate"] + row["val_to_add"], row["conv_rate"] + row["val_to_add_2"]]
+        )
+    return pd.DataFrame(data=result)
  1. Run feast apply, and you should get the following error:
Traceback (most recent call last):
  File "~/.../.venv/bin/feast", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/feast/cli.py", line 506, in apply_total_command
    apply_total(repo_config, repo, skip_source_validation)
  File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 347, in apply_total
    apply_total_with_repo_instance(
  File "~/.../.venv/lib/python3.12/site-packages/feast/repo_operations.py", line 299, in apply_total_with_repo_instance
    registry_diff, infra_diff, new_infra = store.plan(repo)
                                           ^^^^^^^^^^^^^^^^
  File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 745, in plan
    self._make_inferences(
  File "~/.../.venv/lib/python3.12/site-packages/feast/feature_store.py", line 640, in _make_inferences
    odfv.infer_features()
  File "~/.../.venv/lib/python3.12/site-packages/feast/on_demand_feature_view.py", line 521, in infer_features
    inferred_features = self.feature_transformation.infer_features(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/....venv/lib/python3.12/site-packages/feast/transformation/pandas_transformation.py", line 47, in infer_features
    python_type_to_feast_value_type(f, type_name=str(dt))
  File "~/.../.venv/lib/python3.12/site-packages/feast/type_map.py", line 215, in python_type_to_feast_value_type
    raise ValueError(
ValueError: Value with native type object cannot be converted into Feast value type

Adding some debug statements inside python_type_to_feast_value_type, we get the following locals before the error was raised:

name='conv_rate_plus_vals'
value=None
recurse=True
type_name='object'
type(value)=<class 'NoneType'>

As mentioned before this is because all transformation backends don't pass values to the type mapper, e.g. the pandas backend in this case

Specifications

Possible Solution

alexmirrington commented 1 month ago

PR here for those following along: https://github.com/feast-dev/feast/pull/4310