feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.59k stars 998 forks source link

Remote materialization #4526

Open tokoko opened 1 month ago

tokoko commented 1 month ago

Is your feature request related to a problem? Please describe. We already have an option to run online/offline store queries remotely through feature server and offline server, respectively. This way rbac rules will be applied on operations. One piece that's missing is materialization. There are several ways to do this:

I'd probably go with option 3 as a starting point.

dmartinol commented 1 month ago

The same as above but instead of creating a new component, we can reuse OfflineServer to do the request handling. This is slightly awkward from the naming perspective, but probably makes the most sense in term of usage/maintenance.

ATM the OfflineServer was designed to implement the OfflineStore interface only. Adding a method to write to the online stores would introduce an unplanned dependency and raise some concerns.

Why aren't we using the /materialize-incremental endpoint on the FeatureServer instead? (and add a new endpoint for non-incremental jobs) This would avoid any "transport batches and batches of potentially huge datasets." as it would work on the server itself (and would use the remote offline_store to pull_latest_from_table_or_query using the flight protocol) .

Otherwise, I'd be in favour of a dedicated MaterializationServer (with remote offline_store and provided online_store), which can still be designed as a "lightweight fastapi server" if I understood the materialization flow.

dmartinol commented 1 month ago

@tokoko do we want to evaluate this one? Any further comments on what solution to apply?