facebook / relay

Relay is a JavaScript framework for building data-driven React applications.
https://relay.dev
MIT License
18.4k stars 1.82k forks source link

Cache API: invalidation, expiration, refresh, clean, TTL, LRU... #720

Closed slorber closed 7 years ago

slorber commented 8 years ago

Hi,

After reading some doc and viewing some Relay/GraphQL talks I'm interested in the technology but I don't feel it right now because of the lack of fine-grained control of the Relay caching system (at least in the documentation)

I mean everywhere it mentions that Relay is caching data, but at no point it mentions how Relay manage to remove entries from that cache. It makes me think that Relay is an ever-growing cache that will just make the browser or mobile crash after a while...

So, can we have fine-grained control over the caching system under relay?

Features I would expect are:

Also it would be good to be able to tune the cache per type. I mean I may want to be able to say that the users data is more akin to not change and to cache it more agressively than other types of my domain that are more collaborative.

Being able to manually add/remove/listen entries also mean that we can choose to implement whatever we want to implement real-time applications, without necessarily depending on relay subscriptions, and also being able to implement some persistence if we need it right now.

slorber commented 8 years ago

Ohhh I really need some insights on this please :) How do the big players do manage the memory of their caches? Do we simply assume that the session does last enough and that it's nearly impossible that the user reaches browsers memory limits?

Weither we use Relay or Flux with "normalized stores" (gaearon/normalizr) I wonder how people manage cached data eviction in practice

slorber commented 8 years ago

@josephsavona any insights on this plzzz :)

josephsavona commented 8 years ago

This is a great question. Thanks for asking and apologies for the delay in our responding. You mention a lot of points, but it feels like they can largely be summarized into two questions:

Memory Management

Currently, Relay does not evict data from its (in-memory) cache under the default configuration. This means that memory usage may grow during the course of a session, and that all data is cleared e.g. when the user closes the browser tab.

In practice, we've found that this isn't a problem for most apps. Because of the cyclic nature of social graphs there's a often significant amount of overlap in content, and as a result memory usage can grow more slowly than one might intuitively expect.

We recognize that some apps may have different data access patterns and/or longer sessions, which would require some form of cache eviction. This raises of the question of how to determine which data to evict when.

Refetching & Cache Eviction Control

Relay currently offers very coarse-grained control of when data is refetched: primeCache defaults to fulfilling queries using in-memory data, while forceFetch bypasses the cache and refetches data in-full from the server. Again, in practice this has worked well for most of our use cases.

Regarding cache eviction, it's import to understand that Relay is fundamentally different from many typical caches. Whereas typical caches store independent key/value pairs, Relay caches a graph of interconnected objects. We cover the ramifications of this extensively in Thinking in GraphQL. In practice, this means that simple approaches to cache eviction such as TTL or LRU may have unintuitive consequences. For example, products typically care about queries while the cache stores normalized records. If even a single record is evicted from the cache, it could cause an entire query to be effectively "missing" and need to be refetched. Further, the data dependencies of discrete parts of the application may overlap, such that they disagree about the allowable staleness of data.

What's Next

We're actively developing a "garbage collection" (cache eviction)] mechanism for Relay. This is currently experimental, and we plan to make it part of the public API once we have sufficient experience using it in production.

The goal of this garbage collection would be to evict in-memory cached data (and associated metadata such as tracked queries) that is no longer depended on by any view. Products would be in control over when cache eviction was initiated (e.g. in response to a low memory warning), because automatically expiring data could negatively impact perceived performance of apps in the form of longer loading times.

Miscellaneous

Be notified in a listener when a type or field gets updated

Relay supports this internally - components re-render when their data changes. There are several proposals for observing data separately from containers, which we're tackling as part of #559.

Manual eviction of a type…

See the above for background; we're still exploring this area. I opened #756 to track work on public APIs for managing cache eviction.

slorber commented 8 years ago

thanks a lot @josephsavona

Currently, Relay does not evict data from its (in-memory) cache under the default configuration. This means that memory usage may grow during the course of a session, and that all data is cleared e.g. when the user closes the browser tab.

In practice, we've found that this isn't a problem for most apps. Because of the cyclic nature of social graphs there's a often significant amount of overlap in content, and as a result memory usage can grow more slowly than one might intuitively expect.

So I guess I could simply stop worrying and start coding :) My app is not particularly different from other social apps so it should be ok for my usecase too!

Also happy to hear about that GC system :)

AndrewIngram commented 8 years ago

Is there any mileage in a Relay directive on queries themselves indicate how individual fields should be cached?

ie

query {
   currentUser {
     username
     avatar
  }
  product(id: 123) @relay(refetch: always) {
     // stuff     
  }
}

This way we get to have the components declaring how up-to-date they want the data to be. If two components are requesting the same parent data, you'd push the directive down the tree to minimise excessive refetching.

josephsavona commented 8 years ago

@AndrewIngram Definitely: we're currently exploring the idea of query-level TTL. Field-level directives have some problems: as I described above:

the data dependencies of discrete parts of the application may overlap, such that they disagree about the allowable staleness of data.

This applies to container-level TTL as well: your container may be okay if fields A and B are stale, but some other container wants fields B and C on the same object to be fresh, so we refetch B and C and they don't match up with A (i.e. the user sees a partial update). Our goal is to balance performance, consistency, and predictability, which leads us to query-level TTL. There behavior here would be simple:

Feedback appreciated!

evanderkoogh commented 8 years ago

@josephsavona I am about to start coding on a new project that is very different from a social app. (first Relay one!). Query level TTLs would solve almost all of my problems and make it workable. I can't think of any problem that wouldn't be solved completely or partially by that.

Showing stale data while refetching is perfect.

AndrewIngram commented 8 years ago

@josephsavona is there any update on this? The biggest issue i'm seeing in practice is related to mutations refetching everything in the cache. Whilst a better approach might be to do the fat query intersection twice; once with the "active" data (ie the store that matches currently rendered components) and once against the inactive data (everything else). The active match would be used to refetch as part of the mutation query, and everything in the inactive match would be evicted from the cache.

josephsavona commented 8 years ago

@AndrewIngram Thanks for asking. The approach you describe is definitely sound in theory, but turned out to be tricky to implement when we tried it. Instead, we are working on a more significant update to Relay core (see the meeting notes for mentions of a "prototype"). Among other things it makes mutations more explicit and gives developers more control over what is refetched - hopefully addressing the issue you describe as well as making mutations simpler. We'll be sharing more information in the coming weeks once we make a bit more progress.

rturk commented 8 years ago

Hi All. Anyone tested the experimental cache eviction code https://github.com/facebook/relay/blob/master/src/store/RelayGarbageCollection.js ? Any comments/feedback?

rturk commented 8 years ago

@slorber FYI. I'm giving up on normalizr in the early stages of the project sounded like a good Ideia. However normalizr is now our main source of crashes and memory issues

slorber commented 8 years ago

@rturk you mean it crashes because it takes too much memory after a while?

rturk commented 8 years ago

@slorber it 1) Doesn't play well with async functions, so it was often leaking when asynchronous data arrived from server (ie Push Notifications). 2) We use a newsFeed of Products so whenever something changed the hole app will re-render.

slorber commented 8 years ago

@rturk Sorry but I don't understand while I'd really like to :) If this is about memory maybe this would be a more appropriate issue for discussion: https://github.com/paularmstrong/normalizr/issues/47

josephsavona commented 7 years ago

Just a note that a future release (see #1369) should address some of the use-cases discussed here.

wincent commented 7 years ago

Now that Relay Modern is out, let's reset the discussion on cache management and close this one. Relay Modern has a simple garbage collection implementation that retains data required to render the currently mounted views. We can look at building more nuanced collection strategies — involving TTLs etc — on top of the new core, but let's make a new issue (or issues) for the specific strategies to be explored.