Closed woop closed 2 years ago
Thanks @woop - great writeup. Main point I had was all the data movement with transformation and engineering can be done using the concept of Feature Views. So e.g.
So Features Definitions consisting of Feature Views becomes the DSL/Metadata contract to define
So FeatureSets still makes sense, though i think we can just live with one vis a vis View or Table
Next steps here are to create a proposal for both approaches
FeatureServices/Sets are out of scope.
Hey @woop,
Great info. Thanks for the detailed explanation of concepts in different versions. I have recently migrated from "feast 0.9.3" to "feast 0.10.8" and have few questions after using FeatureViews 1) All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureViews, are we planning to remove dependency on Feast Core, Serving and Feast Spark? 2) Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable for online ingestion? 3) With Redis coming into the picture for an online store, do we have plans of providing Redis as an option in feature_store.yaml file?
After migrating to feast 0.10.8, these were some of the questions to which I couldn't find the answer to. Let me know your thoughts @woop
- All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureViews, are we planning to remove dependency on Feast Core, Serving and Feast Spark?
- Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable for online ingestion?
@rakshithvsk these are all very valid questions, but they are off topic for this specific thread. Can you please move the discussion to #1527?
@woop Question. Will the FeatureTable
be deprecated in the future?
- All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureViews, are we planning to remove dependency on Feast Core, Serving and Feast Spark?
- Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable for online ingestion?
@rakshithvsk these are all very valid questions, but they are off topic for this specific thread. Can you please move the discussion to #1527?
Thanks for the reply @woop. I've moved the discussion to over here:- https://github.com/feast-dev/feast/issues/1527#issuecomment-874488877
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I think this is still worth talking about - should FeatureTables be removed from the codebase? They're currently untested anywhere so it seems to be asking for trouble to keep them around. It also makes it harder to rework the data types and both FeatureTables and FeatureViews need to be updated. What do you think @woop?
They can always be added back in a well considered way later for ingestion if required.
Creating this issue to discuss some concepts in Feast, Feature Sets, Feature Tables and Feature Views.
Feature Sets
Prior to Feast 0.8, Feast had a concept of
Feature Sets
(not to be confused with the new Feature Set RFC). Feature sets were logical groups of features that occured together. These groups of features share an entity (which can be composite) and in the offline case they also share timestamps. For example, a feature set could be used to store a log of events, or it could be used to store the results of an aggregation. The idea is that different processes (stream or batch ETLs) would output data into their own tables, and Feast would join these different tables during retrieval. Therefore feature sets avoid a sparse table problem.Importantly, Feature Sets did NOT have a source. Users were always asked to push data to the feature store. For batch ingestion, the users did the following
For stream ingestion, teams would push to a specific topic for a feature set.
The feature store would provide both offline and online storage of user data, and allowed users to imperatively load features into the feature store. Feature sets made the feature store into the source of truth for feature data. Users would
ingest
from both their notebooks as well as their batch or streaming ETL pipelines.Feature Tables
In Feast 0.8, we replaced
Feature Set
withFeature Table
. The main reason was scoping. Many teams already have data being stored in specific locations like data warehouses and lakes. This allows Feast to materialize (load) data from outside the feature store into the feature store for storage and serving, and means that Feast doesn't have to become the source of truth for feature data (it lives externally). Feast would not create or manage the offline store in this case, unlike in Feast 0.7 and before.The idea was not that Feast would never provide an offline store. The primary reason we did not start with managing an offline store was because the source-centric approach scoped down the project and allows us to address most use cases.
Because we had
ingest()
for feature sets in Feast 0.7, we had to provide backward compatibility for teams that wanted to ingest data from ETL pipelines. In order to do so, we still provided theingest()
functionality. However, this pushed directly to the source location, not into the offline store. The point of thisingest()
was only to provide a migration path to the new Feast (0.8, 0.9), not to be a long term API to exist alongside sources. In fact, pushing directly to a source is an anti-pattern since its often the case that teams do not have write access to sources.Feature Views
Feature Views were introduced in Feast 0.10. Feature views can be thought of as
ingest()
functionality in feature tablesFeature views in Feast 0.10 function the same as with feature tables in 0.9, but we do not allow direct ingestion to a feature view's source. The feature view can be "materialized", which pulls from the source and loads the data into the feature store. Right now we only materialize into the online store since we are able to query the batch source directly in order to build training datasets.
Note: Feast is not only concerned with loading data into an online store. Feature views only dictate that the source of data lives externally to the feature store, but there is a case for materialization into both an online and offline store in theory. The use case for the offline store is
Feature Tables (potential reintroduction)
Now that feature views have a clear purpose, we are considering introducing feature tables to address the previously removed ingestion functionality. The use case is the same as feature sets in Feast 0.7.
Users have data in their ETL pipelines or Jupyter notebooks, and they need a structured location to store that data for consumption in models. Feature tables would allow them to load and store their data in the feature store, thereby becoming the source of truth for this feature data. This solves the following problems.
Pseudo code
Alternatives
An alternative proposed by @animeshsingh is to only use feature views and to ask users to always bring their own sources. Users would be responsible for uploading their data to a source location. The benefit of this approach is that we introduce less concepts to Feast and keep our APIs simpler.