Open utterances-bot opened 3 years ago
data point for Grubhub recsys: common periodic-offline job creates features for training and serving (eliminating skew). Hive snapshots for access (sharing) and published to Cassandra for serving. heavy run-time feature caching for serving. integrity maintained via ad-hoc monitoring/alerting with datadog and lineage tracking via ml-metadata
Grubhub's recsys use-case is a little more simple than some as we currently don't support real-time features computed in-between offline job runs (typically where you see Flink et al applied). Other groups like logistics might use online-feature generation.
great framework to think about feature store implementation
Thank you for the great post, Eugene. I have a few follow-up questions:
Thank you :)
Hi Eugene, would it be possible to add a publication date at the top of your articles? Only with a date statements such as "Last month, Splice Machine, a big data platform, launched its own feature store too." make sense. Thanks!
Feature Stores - A Hierarchy of Needs
Access, serving, integrity, convenience, autopilot; use what you need.
https://eugeneyan.com/writing/feature-stores/