`setQueryData()` should be persisted too, right?

frederikhors commented 1 year ago

Describe the bug

I'm using the new experimental_createPersister like this:

import { experimental_createPersister, type AsyncStorage } from "@tanstack/query-persist-client-core";
import { get, set, del, createStore, type UseStore } from "idb-keyval";

function newIdbStorage(idbStore: UseStore): AsyncStorage {
  return {
    getItem: async (key) => await get(key, idbStore),
    setItem: async (key, value) => await set(key, value, idbStore),
    removeItem: async (key) => await del(key, idbStore),
  };
}

export const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      gcTime: 1000 * 30, // 30 seconds
      persister: experimental_createPersister({
        storage: newIdbStorage(createStore("db_name", "store_name")),
        maxAge: 1000 * 60 * 60 * 12, // 12 hours
      }),
    },
  },
});

But I just found out that if I call

const SetPlayerTeam = (team: Team) => {
  queryClient.setQueryData<Player>(
    ['player', team.playerId],
    (old) => old && { ...old, team }
  );
};

the in memory cache works (the team is added to player) but there is not setItem() call for this last queryClient.setQueryData(). If I reload the page the team is not on player anymore (unless I invalidate the player manually).

I think this is wrong, because this change should be persisted too.

I'll create a reproduction if you think it might be useful.

How often does this bug happen?

Every time

Platform

Chrome.

Tanstack Query adapter

svelte-query

TanStack Query version

5.4.3

TypeScript version

5

Additional context

I'm using:

"@tanstack/eslint-plugin-query": "5.0.5",
"@tanstack/query-persist-client-core": "5.4.3",
"@tanstack/svelte-query": "5.4.3",
"@tanstack/svelte-query-devtools": "5.4.3",

TkDodo commented 1 year ago

hmm, good question. Right now, the persister is just a wrapper around the queryFn. If the queryFn doesn't run, we don't persist. And with setQueryData, it doesn't run ... 🤔

@DamianOsipiuk what are your thoughts here?

TheTimeWalker commented 1 year ago

I agree that it would be important if persistence can be somehow implemented with setQueryData. Right now, the flow breaks when you're using opportunistic updates with useMutation queries.

TkDodo commented 1 year ago

one problem I'm seeing is that getQueryData also doesn't read from the persisted storage, and it really can't, because persistence can be async, and getQueryData is sync.

why does the flow "break" with optimistic updates though?

TkDodo commented 1 year ago

Oh and what about if data comes into the cache via hydration 🤔

TheTimeWalker commented 1 year ago

My mistake about the useMutation flow. I didn't have onSettled invalidate queries to force a refetch in the background. 😅 Doing that makes it work fine 👍🏻

DamianOsipiuk commented 1 year ago

For storing the data, if we have a persister set up on the client level, we could easily hook it up. Or accept persister as a prop, but i would like to avoid that.

The problem is that createPersister currently returns a function. But we only need a small portion of it. We could introduce a flag to the returned function to control the flow and execute only a portion of it. Or return different structure from createPersister with few functions that would be picked up in different contexts.

As for getting the data, we would need to introduce another async version of it that will do similar checks as current persister fn. If cache is empty, try to get from storage if configured, otherwise get from cache (this includes hydrated data). Ofc this will work on-demand, and would not attempt to restore automatically on app load.

miafoo commented 10 months ago

Just some general input as I'm testing out the new createPersister in my app. Benchmarking before and after switching, the results are staggering - in some cases 8000x improvement in terms of speed! However I am experiencing some issues all tied to the fact that setQueryData not updating - here are some of them:

I'm using setQueryData to add local-only metadata to certain results. This metadata is NOT available on the server side, and so refetching it would essentially remove the metadata and mess up the state of my data.
My data (it's a timeline of sort) is stored "permanently" on disk (qcTime=Infinity, staleTime=Infinity) locally, and I can't just refetch several thousands queries to update the state. I patch things in the timeline as they come in using setQueryData. Additionally, it's not a guarantee that the original request would even work anymore if I were to invalidate or refetch several days, weeks or months old data.
I use setQueryData to clean up and optimize my timeline data, and this data is also no longer persisted.
I have some background jobs (outside of React context) that change data using setQueryData which correctly rerenders the components with the new data, but that data is no longer updated in the store which is problematic.

I suppose I'm also abusing the library a bit and it was not really meant for all these use cases though, but I do believe that some of them are valid reasons where setQueryData should also persist.

As for getting the data, we would need to introduce another async version of it that will do similar checks as current persister fn. If cache is empty, try to get from storage if configured, otherwise get from cache (this includes hydrated data). Ofc this will work on-demand, and would not attempt to restore automatically on app load.

I think this sounds like a fair solution although I also feel like that could cause issues/bugs if someone for example uses queryClient.getQueryCache().getAll() to "get all" cached queries but it doesn't actually "get all" because some might not be loaded yet with useQuery since they're persisted.. if it loaded everything when app was restored that would be avoided however. If an async function was added to work around this then people would have to know to use this over the other one when using the new persister.

TkDodo commented 10 months ago

There are a couple of things that lead me to think having setQueryData write to the persister as well is not a good idea:

we can't make reads work

reads from the cache are synchronous, so getQueryData will never be able to return data from the cache. If we are thinking about a separate getter that somehow can do this, why not a setter as well?

it will only work when the persister is defined globally

If you have useQuery with a persister passed to useQuery directly, setQueryData will not see it, so it won't write to the disk. That's an inconsistency I wouldn't expect.

the queryClient is just a wrapper around the queryCache

it doesn't do anything - you can just as well call queryClient.getQueryCache() and operate on the raw methods instead. By adding this special thing to setQueryData and ensureQueryData, we break that statement.

there are other ways to write to the cache

For example, you can use hydrate to write data to the cache. Would we need to cover them all, or is this just about queryClient.setQueryData()

Imo, being an "ad-hoc" persister was always part of the plan. Async reads (= running the queryFn) would try to read from the storage first, unless there is data already in the cache. It's like "fallback" data. If we get data into the in-memory cache by other means, we don't need to lookup the fallback.

Maybe it's conceptually wrong that the queryFn "writes data back" to the storage after it runs - maybe this should happen on a more global level, with a queryCache subscription ? Then, all writes would be synced back, but reads would only read lazily. @DamianOsipiuk what do you think ?

DamianOsipiuk commented 10 months ago

Well, the cool thing about current solution is that you can add persister directly to useQuery and it just works™️.

We could potentially hook to the cache to store it in persister, but I'm kinda worried about adding a listener for every query, if users decide to do it this way. queryFn is deduped, while listeners are not. It would work for global persister, but not for the local ones.

Maybe we should just force users to set up persister globally with filterFn for queryKey, or disabling persister for specific useQuery calls. Then we could hook listener on the cache. But ergonomics of this solution is kinda worse than the current one.

We could potentially expose async persistQuery and restoreQuery from createPersister that would take queryKey and queryClient and expect users to call those methods when needed. Ex after setQuery.

TkDodo commented 10 months ago

It would work for global persister, but not for the local ones.

can you elaborate why there's a difference between global and local ones? The global one is just a default value setting - it's functionally equivalent imo to passing persister manually to all useQuery calls

but I'm kinda worried about adding a listener for every query

PersistQueryClient does that, too, and I think it would still be properly deduped because we would listen only to events that put data into the cache. We could also throttle the writes if necessary.

We could potentially expose async persistQuery and restoreQuery from createPersister that would take queryKey and queryClient and expect users to call those methods when needed. Ex after setQuery.

yeah that's an option. We could also just expose the auto-write subscribe hook and then let users decide which way they want ?

DamianOsipiuk commented 10 months ago

can you elaborate why there's a difference between global and local ones

For global persister you call createPersister only once when instantiating queryClient so we could add only one listener.

For per-query users could createPersister on each render resulting in thousands of listeners. And I would like to prevent users shooting their foot by using it wrong.

I beleive that PersistQueryClient works differently as listener creation is out of user reach? I might be wrong though.

TkDodo commented 10 months ago

For per-query users could createPersister on each render resulting in thousands of listeners.

right, I see what you mean. It would have to become a hook like userPersister so that we could use component lifecycle, but then it becomes framework specific again. Even on global level, we would need a way to unsubscribe, which usually requires lifecycle methods. That's why the PersistQueryClientProvider exists, too 🤔

I beleive that PersistQueryClient works differently as listener creation is out of user reach? I might be wrong though.

It sets up a subscription for writes and eagerly restores data. We basically only want the writes, and keep the restores lazily.

okalil commented 9 months ago

In my opinion, the persister should only be an extra layer for performance optimization. If the way I store and fetch local data is too complex, I would just change queryFn implementation. For instance:

useQuery({ 
  queryKey: ['items'],
  queryFn() {
    const items = await localDb.getItems()
    if (!items) {
       const networkItems = await getItemsFromNetwork()
       await localDb.saveItems()
       return networkItems
    }
    return items
  }
})

In this case, instead of calling setQueryData after a mutation, I would direcly update the localDb and then invalidate the query to get the updated value.

TorbjornHoltmon commented 2 months ago

I would love to be able to setQueryData to the query cache one way or another. The query cache is so much more performant than the client cache.

For us who work on multiple-page applications, the client cache can get quite big in some cases and loads on every navigation. Having the cache only be loaded when needed would be great for us!

We are considering just using something like https://github.com/epicweb-dev/cachified , but it would defeat the point of using tanstack query.

frederikhors commented 2 months ago

@TorbjornHoltmon cachified seems an amazing idea!

TanStack / query