feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.51k stars 982 forks source link

Support retrieve_online_documents with embedding function #4454

Open HaoXuAI opened 1 month ago

HaoXuAI commented 1 month ago

We can pass an optional embedding function to retrieve_online_documents to embed data first then retrieve top k documents.

E.g,

    def retrieve_online_documents(
        self,
        feature: str,
        embedding: func,
        query: Union[str, List[float]],
        top_k: int,
        distance_metric: Optional[str] = None,
    ) -> OnlineResponse:
      if query is string and embedding:
             embedded_query = embedding(query)
      return retrieve_online_documents(..., embedded_quer)
HaoXuAI commented 1 month ago

@franciscojavierarceo @tokoko let me what you think

franciscojavierarceo commented 1 month ago

Yeah I like this idea! We could just come up with an opinionated way to do this with HuggingFace as an extra? We could call it feast[genai] or something that reduces friction for getting setup even further?

tokoko commented 1 month ago

One concern from me is... in a client-server execution (using feature server), would this function be applied on the client-side or the server-side? If it's server-side, passing an arbitrary function to this method is a problem. We usually store these kinds of user-provided functions in the registry, don't we? This seems like a bit of a deviation in that sense, or maybe I don't understand what the function does exactly.

franciscojavierarceo commented 1 month ago

I imagine it would look like the demo and it would happen server side. Maybe instead of an arbitrary function we could just have things configurable and make it correspond to a transformer or an OpenAi call.

We can discuss today at community call!

HaoXuAI commented 1 month ago

One concern from me is... in a client-server execution (using feature server), would this function be applied on the client-side or the server-side? If it's server-side, passing an arbitrary function to this method is a problem. We usually store these kinds of user-provided functions in the registry, don't we? This seems like a bit of a deviation in that sense, or maybe I don't understand what the function does exactly.

maybe we can do just the client side. not sure how the server works in feast, but my use case is just apply embedding before the vector be used.