Closed franciscojavierarceo closed 1 month ago
There's an argument to be made that we could skip the
write_to_online_store
in the FeatureView declaration but this metadata would be useful to have in the registry for users.
I was actually thinking the other way around. If we manage to figure out how to use precomputed values only for identical inputs (plus I'm assuming that transformation function itself is deterministic), do we need a force_recompute
field? If a user is completely sure that recompute would yield the same result, what's the point of not using the precomputed value? force_recompute
in this example is also a bit awkward because it's configured for the whole request which might contain more than one odfvs, not per an individual odfv.
A couple of other points:
We probably need some sort of a ttl for storage. The hypothetical feast cleanup
command could be the one that will clean up stale values afterwards.
Another thing that came to my mind is how we would like to handle scenarios when not all features from a particular odfv are requested. I guess this is less of a problem for pandas
and python
transforms as they have to recompute the whole thing even if not all features are requested, but is a harder problem for substrait
.
I was actually thinking the other way around. If we manage to figure out how to use precomputed values only for identical inputs (plus I'm assuming that transformation function itself is deterministic), do we need a force_recompute field? If a user is completely sure that recompute would yield the same result, what's the point of not using the precomputed value? force_recompute in this example is also a bit awkward because it's configured for the whole request which might contain more than one odfvs, not per an individual odfv.
We could use an idempotency key for this, this was something we had planned to add in my last job for this exact use case.
We probably need some sort of a ttl for storage. The hypothetical feast cleanup command could be the one that will clean up stale values afterwards.
Yeah, implementing feast cleanup
handles it in general.
Another thing that came to my mind is how we would like to handle scenarios when not all features from a particular odfv are requested. I guess this is less of a problem for pandas and python transforms as they have to recompute the whole thing even if not all features are requested, but is a harder problem for substrait.
Agreed.
FYI @tokoko I had created a ticket for this before actually here: https://github.com/feast-dev/feast/issues/4077
@tokoko @HaoXuAI @shuchu once this is done I'll consider ODFVs to be out of beta and a complete feature.
I think interface itself is pretty much stable at this point, so in that sense I agree. Still, from the overall ux perspective, there are still known limitations depending on the use case:
That's a good point.
I think adding Python offline is the only other necessary item. Substrait/ibis is broader scope and not as used by the community yet.
The problem
As discussed in https://github.com/feast-dev/feast/issues/4365, we should add the ability to write an On Demand Feature View (ODFV) to store the output of the calculation.
The solution
The ideal solution would require a boolean to the ODFV decorator as metadata to control the write behavior and another boolean in the
get_online_features
method to allow for users to force features to be recomputed.The writes would be done by calling
push()
orwrite_to_online_store()
with the underlying raw data (inputs) and storing the transformed feature values (outputs) into the online store. The ODFV would be called before executing the writes.The change for the ODFV definition would be:
And the change for the
get_online_features
call would be:Again the
write_to_online_store: bool
parameter would dictate whether this ODFV would write to the online store and theforce_compute: bool
would dictate whether the ODFV would always recalculate the features. There's an argument to be made that we could skip thewrite_to_online_store
in the FeatureView declaration but this metadata would be useful to have in the registry for users.The write call would be the standard:
Alternatives
We discussed creating a different feature view for this behavior altogether but using the existing ODFV benefits from reusing a lot of existing code and documentation. Moreover, the industry has adopted this language so adding on top of the language feels more natural than adding entirely new language.
Additional context
After the implementation it would be ideal to add this as an example in the local Credit Scoring tutorial.