feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.62k stars 1k forks source link

feat: Adding support for native Python transformations on a single dictionary #4724

Closed franciscojavierarceo closed 2 weeks ago

franciscojavierarceo commented 3 weeks ago

What this PR does / why we need it:

This PR adds support for native Python transformations on a dictionary of a singleton (i.e., what most Data Scientists would call a single row).

As an example, Data Scientists can now transform features using an On Demand Transform like so:

@on_demand_feature_view(
    sources=[driver_stats_fv[["conv_rate", "acc_rate"]]],
    schema=[
        Field(name="conv_rate_plus_acc_python_singleton", dtype=Float64)
    ],
    mode="python",
    singleton=True,
)
def python_singleton_view(inputs: dict[str, Any]) -> dict[str, Any]:
    output: dict[str, Any] = dict(conv_rate_plus_acc_python=float("-inf"))
    output["conv_rate_plus_acc_python_singleton"] = (
        inputs["conv_rate"] + inputs["acc_rate"]
    )
    return output

Compare this to the previous logic to do the transformation which looks like:

@on_demand_feature_view(
    sources=[driver_stats_fv[["conv_rate", "acc_rate"]]],
    schema=[
        Field(name="conv_rate_plus_val", dtype=Float64),
    ],
    mode="python",
)
def python_demo_view(inputs: dict[str, Any]) -> dict[str, Any]:
    output: dict[str, Any] = {
        "conv_rate_plus_val": [
            conv_rate + acc_rate
            for conv_rate, acc_rate in zip(
                inputs["conv_rate"], inputs["acc_rate"]
            )
        ],
    }
    return output

Which issue(s) this PR fixes:

https://github.com/feast-dev/feast/issues/4075

Misc

Added documentation in this PR: https://github.com/feast-dev/feast/pull/4741

franciscojavierarceo commented 2 weeks ago

FYI @robhowley

robhowley commented 2 weeks ago

is there no way to do any kind of type inference in the codepath to determine it's a singleton or leave it up to the person implementing the transformation to know what data they're expecting and raise errors accordingly? just kinda lean more into duck typing. strikes me as a little unusual to have to pass a bool in to specify what the type of my data is. im assuming there's context on this im missing about how the subsequent transformations will interact w proto stuff

franciscojavierarceo commented 2 weeks ago

That's the way I originally started do it, the problem is that requires explicit type annotations in the ODFV definition and I think that's too strict of a requirement.

franciscojavierarceo commented 2 weeks ago

I definitely agree this Boolean is not the preferred option but it was the what made the most sense to me given that other sorts of type inference start to become brittle once you allow for different input types.

robhowley commented 2 weeks ago

i was thinking more like relax the type annotation be dict[str, Any] instead of dict[str, List[Any]]. it's the transformation functions problem to handle it.

franciscojavierarceo commented 2 weeks ago

Yeah that could be done today because type annotations can still be ignored by users but I don't think that's a good enough experience for Data Scientists. I think this makes the transformation really easy and it has rational defaults.

robhowley commented 2 weeks ago

and defining it as a Union of the two types wasn't sufficient?

franciscojavierarceo commented 2 weeks ago

@robhowley I tagged you in some places that outline some of why I ended up doing it this way.

HaoXuAI commented 2 weeks ago

I'm also not following why it can't be just a type check. Adding this singleton feels complex the logic. also since it's dict[str, Any], if you enforce the value to be List, it could be list of list of list..., so you can't get the logic abstracted.

franciscojavierarceo commented 2 weeks ago

Depends on the type check. You may have input features that are mixed that you want to create a feature out of.

For example, if I did a type check to ensure all inputs were not lists, they will not accommodate the case I just mentioned.

franciscojavierarceo commented 2 weeks ago

Do you have suggestion? I had dawned discussion with @robhowley indicating that this was not my preferred solution but simply the one I ended up having to settle on due to the tradeoffs.

HaoXuAI commented 2 weeks ago

Do you have suggestion? I had dawned discussion with @robhowley indicating that this was not my preferred solution but simply the one I ended up having to settle on due to the tradeoffs.

I don't have any direct suggestion on this yet. But I believe after refactoring rebuild the Transformation entirely it should work in some structured way :)