Performance bottleneck in get_online_features due to repeated metadata resolution from registry

breno-costa commented 3 weeks ago

Is your feature request related to a problem? Please describe. I'm running some benchmarks with Python SDK and profiling the code to understand more about its execution. Here's the profile report for my benchmark.

If you look at it, the _prepare_entities_to_read_from_online_store method and its sub-calls account for more than half of the execution time. For each call, it needs to resolve metadata from registry and this takes a lot of time in a relative comparison to other parts of the code.

However, in a ML inference scenario, we usually create a feature service for each ML model application when it's deployed. The ML application calls the method get_online_features using the same feature service, i.e. all calls use same metadata. The current SDK implementation creates unnecessary overhead since it resolves same metadata on every call.

Describe the solution you'd like I don't know whether it's possible to make metadata resolution more efficient. If not, a potential solution would be to cache the metadata in the SDK itself. There might be some configuration that turn this caching on/off.

Additional context I see many functions used to get online features have been moved to utils.py. This can make changes and optimizations more complex. There are over 500 lines of code in these util functions.

tokoko commented 3 weeks ago

thanks for digging in. _get_online_request_context is the function meant to be cached introduced by #4041 for that exact purpose. I simply never found time to implement actual caching afterwards.

P.S. As for a lot of relevant functions moving to utils.py, that was easy and dirty way that allowed me to move get_online_features logic from feature_store.py to online_store.py. Some of the functions online flow depended on was also used by get_historical_features, so I shoved everything in utils instead. Feel free to move anything online store-specific to online_store.py.

breno-costa commented 3 weeks ago

thanks for digging in. _get_online_request_context is the function meant to be cached introduced by #4041 for that exact purpose. I simply never found time to implement actual caching afterwards.

Have you thought about any implementation for this cache mechanism? We've already created a custom cache implementation here, but it's quite radical. We even cache the hash keys and parts of redis keys as they use feature names. I'm not sure if we'd want to implement this level of caching here, although it is very efficient.

tokoko commented 3 weeks ago

That's interesting. I'm not against radical, but I think the best first step would be to keep it implementation-agnostic that could work with all online stores, not just redis. (Although we can have redis-specific caching layer on top) I was thinking of caching _get_online_request_context call outputs by either feature service name or some sort of a concatenation of feature names if list of features is used instead.

The invalidation is the tricky part, I think. The easy solution would be to have some sort of time-based or cache-size based ttl. Better solution would be to somehow bind the cache invalidation/refresh to the registry cache, so online store metadata cache gets invalidated only when registry cache is updated and there are some changes in the underlying metadata.

breno-costa commented 3 weeks ago

That's interesting. I'm not against radical, but I think the best first step would be to keep it implementation-agnostic that could work with all online stores, not just redis.

Makes a lot of sense.

I was thinking of caching _get_online_request_context call outputs by either feature service name or some sort of a concatenation of feature names if list of features is used instead.

I was thinking of some similar approach to start with. We could use something like Python's lru cache or cachetools to manage the cache. With lru cache, we can set maxsize only as far as know, but with cachetools, we can set maxsize or ttl.

Better solution would be to somehow bind the cache invalidation/refresh to the registry cache, so online store metadata cache gets invalidated only when registry cache is updated and there are some changes in the underlying metadata.

This solution seems interesting, but I have to take a look in the implementation to see how to connect the dots here. How to know when the registry cache is updated from the online store module?

franciscojavierarceo commented 3 weeks ago

+1 to lru_cache

franco-bocci commented 3 weeks ago

We've struggled with this as well, and is similar to what I raised in Slack (here).

We use a wrapper that uses Feast to be able to extend it, and we use the remote registry. During online-serving, the methods that interact with the registry are:

remote_registry_methods_to_cache = (
    "get_feature_service",
    "list_feature_views",
    "list_stream_feature_views",
    "list_on_demand_feature_views",
    "list_entities",
    "get_entity",
)

Then we implement a cache on the RemoteRegistry which is the client that calls the registry server, with something like:

def cache_based_on_ttl(
    cache_max_size: int, cache_ttl_seconds: int
) -> cachetools.TTLCache | cachetools.LRUCache:
    if cache_ttl_seconds == 0:
        return cachetools.LRUCache(maxsize=cache_max_size)

    return cachetools.TTLCache(
        maxsize=cache_max_size,
        ttl=cache_ttl_seconds,
    )

so we enable a TTL Cache, or a LRU Cache. The registry config has a way of passing the ttl_seconds (or something like that), so the existing configuration can be used.

Then we override those specific methods so that the cache is used.

franco-bocci commented 3 weeks ago

Registries have a def refresh(self, project: Optional[str] = None): method. As the cache would be implemented at an instance level, cache can be invalidated during that call.

The downside of applying cache at the instance level is that instances might not be garbage collected, but that should not be a problem in this case because we're not instantiating a lot of registries.

tokoko commented 3 weeks ago

@franco-bocci I think this is a little different to what you're doing. I guess with your wrapper you're overcoming the fact that RemoteRegistry doesn't extend CachingRegistry. (and even if it was the way CachingRegistry maintains a cache is less that ideal, but that's a different story).

What Breno is suggesting here is to cache metadata wrangling work on a higher level, not just data retrieval from the registry, also the work that feast needs to do to stitch those metadata objects together. _get_online_request_context encapsulates all the methods you listed, plus lots of other work, so that's a better code point to cache imho.

@breno-costa sounds good, we can start w/o integration between two caches, I'm fine with that, I don't know how best to connect those dots either. I think it's important for the cache to be ttl-based as in a realistic scenario a single web server might just be serving a single feature service. Invalidation that's size-based only would mean that invalidation never happens in those cases. cachetools looks like a better fit.

privatedumbo commented 3 weeks ago

It would solve the problem, but with the current design, we would be applying a cache for a private method, that would be controlled separately than the cache's registry.

I agree on the point that we would benefit from caching also that wrangling happening under the hood which what I shared does not solve, but maybe we could model it better (somehow), to avoid spreading caches around?

breno-costa commented 3 weeks ago

I toke a quick look in the codebase. The method online_store.get_online_features receives a registry object as argument. In theory, we can use this registry object to access a "source of truth" method for cache invalidation.

    def get_online_features(
        self,
        config: RepoConfig,
        features: Union[List[str], FeatureService],
        entity_rows: Union[
            List[Dict[str, Any]],
            Mapping[str, Union[Sequence[Any], Sequence[ValueProto], RepeatedValue]],
        ],
        registry: BaseRegistry,
        project: str,
        full_feature_names: bool = False,
    ) -> OnlineResponse:

However, the problem is that every registry implementation manages its cache validation in a non standardized way. Some examples:

Caching Registry: https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/registry/caching_registry.py#L431-L466
Proto Registry: https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/registry/registry.py#L905-L924
Snowflake Registry: https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/registry/snowflake.py#L202-L221

The best solution seems to be creating a method in BaseRegistry interface that returns if the registry cache is expired (e.g. def is_cache_expired(self)) and every Registry sub-class must implement this method.

With that change in place, we could use the new method within online_store.get_online_features to invalidate cache for feature metadata. Wdyt?

ps: this proposed new registry cache interface can be handled in another issue.

feast-dev / feast

Performance bottleneck in get_online_features due to repeated metadata resolution from registry #4710