feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.46k stars 976 forks source link

Redis Online Store adapter is not production-ready #3596

Open roelschr opened 1 year ago

roelschr commented 1 year ago

Purpose of this issue

I'm opening this issue to bring to light some concerns and potential areas for improvement in the Redis Online Store adapter for the whole community. I understand that the Feast team has other priorities, and the community is also encouraged to contribute. My main goal is to make this information more clear for existing and future users, so they can make informed decisions based on the documentation and avoid potential pitfalls (that my team was not able to avoid prior to start using Feast with Redis).

Expected Behavior

Redis is positioned as a well-supported Online Store within Feast, with a dedicated blog post, being used in performance assessment, and boasting the most complete set of functionalities.

Current Behavior

However, it seems that the current implementation of Redis' Online Store has some issues and lacks certain basic functionalities, even those mentioned in the functionality matrix.

Here is a list of concerns:

  1. While, in theory, feature data is "collocated by entity key", in practice it works as if it were "collocated by feature view". This is due to get_online_features() performing one read_from_online_store per feature view in your request. This implies that the more feature views you request, the longer get_online_features() will take, regardless of key collocation. Additionally, all database reads are synchronous.

  2. Although, in theory, it "supports TTL (time to live) at retrieval", Feature View TTL doesn't seem to matter in practice. Instead, you have to set up a store-level TTL that is only used to expire whole entities. In complex production feature store, with several feature views for the same entity and different TTL requirements, this is never going to be useful. As a result, you can get features older than TTL during online feature retrieval with Redis. Here is a "closed" and marked as "done" issue mentioning this.

  3. In theory, the online store should "generate a plan and update infrastructure (e.g. tables)". However, in practice, renaming or deleting features and feature views will leave them stored in Redis, causing it to grow indefinitely. Feast will only delete data from Redis when the last features/feature view of a specific entity is not in use anymore. Here is a "closed" issue mentioning this. This is not an issue when using key_ttl_seconds, but in complex production-level feature stores, this may not be a viable solution.

Possible Solution

Switching to another online store (like Dynamo or Cassandra) can solve some of the issues, but not all. The performance of Redis itself is not in question, but the Feast implementation may need improvements.

Some possible fixes include:

  1. I see no reason for all online stores to not be "collocated by entity key". I really like this about Feast's Redis data model. In offline stores, feature views have conflicting timestamps and different historical sources. But in online stores, you want to only keep the latest values (like a common dimension in a warehouse), so all feature views that share the entities can be stored in one table/hash/whatever. It would speed up online feature retrieval across all online stores. Cassandra, for example, could very much implement the same data model for column families. But Feast itself doesn't make use of the better data model in Redis.

  2. Addressing the conflict between Redis' inability to allow EXPIRE (or ttl) in hash child keys and the "collocated by entity key" design. One basic principle in Feast is that if you request a feature from the online store and its value is older than its TTL, you should get a null. Workarounds may be needed to make Redis a more suitable data store for Online Stores in this regard. Cassandra, for example, has TTL for both table and individual columns.

  3. Implementing the use of HDEL for infrastructure updates when feature views are renamed/deleted. For individual features, the change might be more substantial, as update_infra() currently only has access to feature views and entities to create/delete. Further exploration of this issue is necessary.

I hope that addressing these concerns can lead to a more robust and reliable implementation of the Redis Online Store adapter in Feast, or even improve Feast as a whole. As a heavy user, I would prioritize these changes over other integrations, and I believe I'm likely not the only one. Strengthening the Redis integration and enhancing Feast's performance will undoubtedly benefit the wider community.

adchia commented 1 year ago

Thanks so much for the detailed response. cc @DvirDukhan

DvirDukhan commented 1 year ago

@adchia @roelschr

Regarding the current data model:

  1. We can address the "collocated by entity key" issue by allowing online_read to get multiple feature view names and their respective features. By doing so, we can send a single hmget command for each entity and the required projection.
  2. FeatureView TTL - the only solution I can think of is delegating this responsibility to the Feast side, meaning that the application can/should know when the feature view expiration event happened and trigger a clean-up. IMO, this is a suboptimal solution with respect to performance and usage complexity (checking for expiration upon each request, clean-up responsibility in multiple Feast instances application, etc..)

Suggested new data model - First proposed at https://github.com/feast-dev/feast/issues/2254#issue-1117083681 Basically, it is namespacing the entities by feature view. The Hash key name is: ProjectName:FeatureViewName:EntityId The Hash map key names are the feature names

                              ------> FeatureName_1:FeatureValue_1
                             /
Project:FeatureView:EntityId--------> FeatureName_2:FeatureValue_2
                             \
                              ------> FeatureName_3:FeatureValue_3

I believe this can solve most of the issues above.

judahrand commented 1 year ago

Possibly a massive aside here but it would also be fantastic if the Redis connector used Redis' Server Side Assisted Client Side Caching. This could help reduce serving latencies caused by deserialisation + network latency for commonly served features. This is common in recs systems. Features for the things being recommended to multiple users are not served randomly, whereas user features are served in a much more stochastic manner.

rodrigo-christiansen-wonolo commented 1 year ago

We deployed a Redis Online Store, and the lack of collocation by entity key is generating massive delays. We have a Feature Service that uses 11 different Feature Views and this makes our retrieval times abysmal.

judahrand commented 1 year ago

@adchia @roelschr

Regarding the current data model:

  1. We can address the "collocated by entity key" issue by allowing online_read to get multiple feature view names and their respective features. By doing so, we can send a single hmget command for each entity and the required projection.
  2. FeatureView TTL - the only solution I can think of is delegating this responsibility to the Feast side, meaning that the application can/should know when the feature view expiration event happened and trigger a clean-up. IMO, this is a suboptimal solution with respect to performance and usage complexity (checking for expiration upon each request, clean-up responsibility in multiple Feast instances application, etc..)

Suggested new data model - First proposed at #2254 (comment) Basically, it is namespacing the entities by feature view. The Hash key name is: ProjectName:FeatureViewName:EntityId The Hash map key names are the feature names

                              ------> FeatureName_1:FeatureValue_1
                             /
Project:FeatureView:EntityId--------> FeatureName_2:FeatureValue_2
                             \
                              ------> FeatureName_3:FeatureValue_3

I believe this can solve most of the issues above.

Can I suggest that in addition to these changes to the data model that we use Redis Hash Tags to colocate all the feature views of a single entity? So the Hash key name would be ProjectName:FeatureViewName:{EntityId} (note the curly braces). This would optimize for the Redis Cluster case too.

https://redis.com/blog/redis-clustering-best-practices-with-keys/

roelschr commented 1 year ago

Thank you very much @DvirDukhan (and others).

From what I understand, changing the data model to ProjectName:FeatureViewName:EntityId (with or without Redis hashtags) means making it collocated by feature view. It would allow implementing TTL through expire on the feature view level 🎉

But it comes with drawbacks:

keyslot = CLUSTER KEYSLOT {EntityId}
n = CLUSTER COUNTKEYSINSLOT keyslot
keys = CLUSTER GETKEYSINSLOT keyslot n
MGET keys
--- then filter non-requested keys on Feast

Not sure, I'm not a specialist in Redis 👀 @judahrand

Finally, is it possible to have HDEL during feast apply for features and feature views that were renamed/removed?

judahrand commented 1 year ago

bigger memory footprint: 1 hash key with N fields << N hashes with 1 field. Not sure about hashtags' impact here tho.

This is true, however, I don't think that realistically there is any other way to treat different Feature Views differently (re TTL or deletion etc.). I don't think that Hash Tags effect this but could be wrong.

prevents us from implementing the "single hmget command for each entity".

Also true, but I think there was some investigation at some point as to whether it actually always makes sense to use HMGET or whether HGET and filter manually is actually faster in some cases. Can't remember.

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

sudohainguyen commented 11 months ago

let's not make this issue got stale, I do encounter with redis online store performance as well and I believe we can do better with the data model

sudohainguyen commented 11 months ago

@roelschr would you mind taking lead opening PR for this?

rodrigo-christiansen-wonolo commented 11 months ago

Is there another Online Store implementation that would be best for using Feature Services comprised of many Feature Views? According to the docs only Redis has Entity Key collocation, although that isn't really the case.

roelschr commented 11 months ago

@sudohainguyen Unfortunately, I have too much on my plate right now. We will, possibly, migrate to either DynamoDB or Cassandra (AWS Keyspaces) in the next weeks. The tradeoff between higher feature retrieval vs costs and the use of TTL is what is driving this decision. We'll have to make changes to the implementation of one of the connectors to enable better usage of TTL, tho.

Is there another Online Store implementation that would be best for using Feature Services comprised of many Feature Views? According to the docs only Redis has Entity Key collocation, although that isn't really the case.

Unfortunately, the colocation on a db level doesn't mean much while Feast itself keeps retrieving features through a loop over all feature views in the request.

sudohainguyen commented 11 months ago

Unfortunately, the colocation on a db level doesn't mean much while Feast itself keeps retrieving features through a loop over all feature views in the request.

true, there's a workaround that we pre-gather all of the features into a single feature table

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.