feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.59k stars 998 forks source link

Return different matrix types for online serving #4714

Open franciscojavierarceo opened 1 week ago

franciscojavierarceo commented 1 week ago

Is your feature request related to a problem? Please describe. We should allow Feature Views to return matrices/tensors natively. For example, torch.tensors.

At the moment, for some features we require the client to serialize the output into a matrix before running inference. Feast should support executing these transformations and serializing the data into matrices for both online and offline retrieval.

Describe the solution you'd like

features: torch.Tensor =  store.get_online_features()

Describe alternatives you've considered Not supporting this is the alternative, which is the current state, which leaves users to write their own brittle logic to handle various complexities.

Additional context @HaoXuAI @tokoko I know we discussed sklearn pipelines in the past and I thought I'd share my thoughts.

HaoXuAI commented 1 week ago

torch feature is nice. I guess we need to release the "timestamp" constraints in our APIs, since it probably doesn't make too much sense to attach embedding feature with a timestamp?

breno-costa commented 1 week ago

The method store.get_online_features(...) returns an OnlineResponse object that has some conversion methods like to_dict() and to_df(). Should this suggestion be implemented as an another conversion method like to_torch() or something like this?

franciscojavierarceo commented 1 week ago

torch feature is nice. I guess we need to release the "timestamp" constraints in our APIs, since it probably doesn't make too much sense to attach embedding feature with a timestamp? Agreed.

franciscojavierarceo commented 1 week ago

@breno-costa that code is a serialization step though. We would want to treat Torch Tensors (or xgb.DMatrix) as a first class data type.

The concrete examples I'm thinking of are one hot encoding or impact encoding. It'd be useful for us to handle this for MLEs natively, especially when handling unseen categories.

dandawg commented 1 week ago

This plus sparse tensors/sparse matrices could be a really cool optimization -- less data, faster io, more powerful API.

franciscojavierarceo commented 1 week ago

This plus sparse tensors/sparse matrices could be a really cool optimization -- less data, faster io, more powerful API.

Exactly.

HaoXuAI commented 1 week ago

if we can leverage "arrow" as our primary format, then it can be directly converted to pandas/torch with arrow apis i believe

franciscojavierarceo commented 1 week ago

Cool, I'll check that out. This is basically the next step after vector support to making NLP a first class citizen.