Open EXPEbdodla opened 3 weeks ago
@franciscojavierarceo FYI
This is a good idea. @tokoko
Introduce a materialization state field to Feature View to know the current status of Feature View
What do you consider state
to be here? Is it always the last upper bound of materialization window? or maybe some arbitrary user-defined data?
Log table in SQL Registry to understand the materialization events. Materialization Job ID and associate with project, feature view,, Start time, end time, Number of records written to online store during the interval
I feel this is the most problematic one here because of file-based registries, where it will be considerable harder to accomplish the same.
Another alternative I've mentioned before is to support this sort of materialization log, but move the APIs for it to online store instead of the registry. The major benefit of online store-managed materializations is that supporting multiple online stores at once (much requested feature) will become a lot easier, plus we won't run the risk of bloating registry accidentally. wdyt?
This can get complicated either way we do it.
From a first principles perspective, this is metadata and metadata belongs in the registry because that's what's intuitive to users...but that can overload the registry and result in OOMs when caching the registry.
We could support both and make it configurable by the user. We could also only store the most recent materialization metadata for each feature view by default and warn about memory issues if someone configures file based and full metadata history for materialization.
For me state
is mainly Materialization is going on currently or not. This can help to avoid parallel execution of Materializations when an active materialization is going on. Log table will have the additional details of Materialization.
Agree with @franciscojavierarceo mentioned, this is primarily metadata information which is suitable to store in registry only and storing only latest materialization information on materialization_interval rather than storing all interval information.
It can be an optional feature to some of the registry's as an option.
Is your feature request related to a problem? Please describe. We are using Feast at a larger scale. We have multiple users using the Feast with single registry with multiple projects. Each project is associated to a Team. As registry scaling up, we would like to understand how the users are using Feature Store and what is state of the feature views. We would like to understand Are there any active materializations going on. What is the materialization window it's using and how many records it's trying to UPSERT to online stores. I see the materialization_interval field on feature view, as we continue running materialization on a daily basis, that field would become a bottleneck soon during the serialization and deserialization. We need to have a right way to know the status and log the materialization history information.
This feature may be needed for other types of feature views like Stream Feature View.
Describe the solution you'd like Solutions:
Describe alternatives you've considered No alternative at this point. Reaching out to users to understand if any materialization jobs running. As user base increases, it's hard to get hold of each one to understand what's happening.
Additional context NA