Questions regarding the transportValues

eknkc commented 11 months ago

Hi,

I've been looking at the code and doing some experiments myself. While I do not have an issue, I'd like to learn about some details (sorry if I missed something in the code thoguh, not really familiar with the apollo internals);

Why do we need useTransportValue and keeping track of hook returns? It looks like as long as the cache is populated, on the client side, the return values will be consistent with the server. Is this for cases like a useSuspendQuery running first, populating some cache and useQuery somewhere else running the same query?
It looks like due to transportedValues and the incomingResults will duplicate almost every result. Any chance to avoid this?
How would this behave if there are more than one provider on the page? transportedValues have useId keys so they should be fine but incomingResults would not work (I assume). Any chance to keep track of multiple ApolloRehydrationContext instances also using useId or something like that? I guess useRehydrationContext could also receive an id parameter from useTransportValue and only restore the cache entries belonging to that id? That should also solve the keeping track of transportValues I guess? Cache would be consistent with whatever happened on server if each useTransportValue call would restore specific cache belonging to that id.

Also, the beta client exports useSuspenseQuery without _experimental suffix now.

eknkc commented 11 months ago

Here's an experimental rehydration setup I have. I know this does not handle a lot of cases you guys need to do, just an experiment. This seems to perform well and provide minimal serialized data on ssr. Am I missing anything here?

Basically I have a before and after hook running each time useSuspenseQuery is executed. The after one serializes whatever gets written to the cache, only runs on server. Before one checks if this operation id has a serialized cache data in window, if there is, it restores that to the inmemory cache. I feel like this will fail miserably during streaming complex documents though.

const CacheSymbol = Symbol.for("UM_REHYDRATE");

declare global {
  interface Window {
    [CacheSymbol]?: { [key: string]: Cache.WriteOptions[] };
  }
}

type RehydrationContext = {
  cache: Cache.WriteOptions[];
  projectId?: string;
  injected: string[];
  injecting: boolean;
};

let RehydrationCtx = createContext<RehydrationContext>(null!);

export function ApolloProvider({ children }: { children: ReactNode }) {
  let context = useMemo(
    () => ({
      cache: [],
      projectId,
      injected: [],
      injecting: false,
    }),
    [projectId]
  );

  const client = useMemo(() => {
    let cache: InMemoryCache;

    if (typeof window == "undefined") {
      cache = new RehydratingCache(context);
    } else {
      cache = new InMemoryCache();
    }

    return new ApolloClient({
      link: ...
      cache: cache,
    });
  }, []);

  const suspenseCache = useMemo(() => new SuspenseCache(), []);

  return (
    <Provider client={client} suspenseCache={suspenseCache}>
      <RehydrationCtx.Provider value={context}>{children}</RehydrationCtx.Provider>
    </Provider>
  );
}

export function useBeforeRehydrationContext(id: string) {
  const { cache } = useApolloClient();

  if (typeof window !== "undefined") {
    const store = window[CacheSymbol];
    const item = store?.[id];

    if (item) {
      item.forEach((x) => cache.write(x));
      delete store[id];
    }
  }
}

export function useAfterRehydrationContext(id: string) {
  const rehydrationContext = useContext(RehydrationCtx);
  const insertHtml = useContext(ServerInsertedHTMLContext);

  if (typeof window !== "undefined") return;

  if (insertHtml && rehydrationContext.cache.length) {
    const cache = rehydrationContext.cache;
    insertHtml(() => <RehydratingData context={rehydrationContext} cache={cache} id={id} key={id} />);
    rehydrationContext.cache = [];
  }

  return rehydrationContext;
}

function RehydratingData({ cache, context, id }: { id: string; context: RehydrationContext; cache: RehydrationContext["cache"] }) {
  if (context.injected.includes(id)) {
    return null;
  }

  context.injected.push(id);

  return (
    <script
      key={id}
      dangerouslySetInnerHTML={{
        __html: `((window[Symbol.for("UM_REHYDRATE")] ??= {})["${id}"] ??= []).push(...${JSON.stringify(cache)});`,
      }}
    ></script>
  );
}

class RehydratingCache extends InMemoryCache {
  context: RehydrationContext;

  constructor(context: RehydrationContext) {
    super();
    this.context = context;
  }

  write(options: Cache.WriteOptions): Reference | undefined {
    this.context.cache.push(options);
    return super.write(options);
  }
}

export const useSuspenseQuery = wrap(_useSuspenseQuery);

function wrap<T extends (...args: any[]) => any>(useFn: T): T {
  return ((...args: any[]) => {
    const id = useId();
    useBeforeRehydrationContext(id);
    let res = useFn(...args);
    useAfterRehydrationContext(id);
    return res;
  }) as T;
}

phryneas commented 11 months ago

Hi @eknkc, great questions & good observations!

* Why do we need `useTransportValue` and keeping track of hook returns? It looks like as long as the cache is populated, on the client side, the return values will be consistent with the server. Is this for cases like a `useSuspendQuery` running first, populating some cache and `useQuery` somewhere else running the same query?

This is showing in this diagram from the RFC:

sequenceDiagram
  participant GQL as Graphql Server

  box gray Server
  participant SSRCache as SSR Cache
  participant SSRA as SSR Component A
  end
  participant Stream

  box gray Browser
  participant BCache as Browser Cache
  participant BA as Browser Component A
  end

  participant Data as External Data Source

  SSRA ->> SSRA: render
  activate SSRA
  SSRA -) SSRCache: query
  activate SSRCache
  Note over SSRA: render started network request, suspend
  SSRCache -) GQL: query A
  GQL -) SSRCache: query A result
  SSRCache -) SSRA: query A result
  SSRCache -) Stream: serialized query A result
  deactivate SSRCache
  Stream -) BCache: add query A result to cache
  SSRA ->> SSRA: render
  Data -) BCache: cache update
  SSRA ->> SSRA: other children of the suspense boundary still need more time
  Note over SSRA: render successful, suspense finished
  SSRA -) Stream: transport
  deactivate SSRA
  Stream -) BA: restore DOM
  BA ->> BA: rehydration render
  Note over BA: ⚠️ rehydration mismatch, data changed in the meantime

Essentially, we have two time deltas here that can cause problems:

A delay between the request finishing on the server and the data being transported to the client. Right now, this is the "big" delay - we can't flush data to the browser whenever we want. That happens only when a suspense boundary finishes.
A delay between the request being added to the cache in the browser and the component rehydrating. This is usually a very short delay right now - but there are scenarios, where this can also become a long delay. Imagine this sequence:
- component fetches data
- data is received on the server
- another hook in component needs more data to be fetched, component stays suspended
- another component somewhere else in the tree finishes suspending - data is transported over
- data is added to the browser cache
- something else modifies the data in the browser
- the component finishes suspending and html with the outdated server-data tries to rehydrate in the browser -> hydration mismatch

Right now, 1. is probably more common, and 2. is pretty incommon. Once React adds that useStream hook we are really hoping for, that might change though - 1. will become short and 2. will get longer.
Generally, this is something that we want - we want to have results available on the client as soon as possible, so that they can be picked up by useQuery without triggering a request, or by useFragment as soon as possible.

* It looks like due to `transportedValues` and the `incomingResults` will duplicate almost every result. Any chance to avoid this?

We could add a lot of "deduplication" logic to the whole thing. Right now we use superjson, which will not deduplicate duplicate contents - but there are surely other libraries that do that kind of thing. Might be worth experimenting. On the other hand, once "result" and "hook data" end up in two different transport chunks, this might lose value again (or get a lot more complicated).

* How would this behave if there are more than one provider on the page? `transportedValues` have `useId` keys so they should be fine but `incomingResults` would not work (I assume). Any chance to keep track of multiple `ApolloRehydrationContext` instances also using `useId` or something like that? I guess `useRehydrationContext` could also receive an id parameter from `useTransportValue` and only restore the cache entries belonging to that id? That should also solve the keeping track of transportValues I guess? Cache would be consistent with whatever happened on server if each `useTransportValue` call would restore specific cache belonging to that id.

Generally, we only support a single provider and a single client right now. If you create multiple NextSSRInMemoryCache instances on the browser, we will throw an exception.

Also, the beta client exports useSuspenseQuery without _experimental suffix now.

Good catch - I was on conference last week and didn't get to that yet. Right now we still have a bug preventing that update, but you can follow https://github.com/apollographql/apollo-client-nextjs/pull/32 on this.

(I'll take a look at your second post in a second answer, I think that one might need additional thought :) )

eknkc commented 11 months ago

Thanks a lot for the detailed explanation. I also read through #28 and it all made more sense. Hopefully React core provides streaming functionality :)

BTW, out of curiousity, I was investigating if this could be handled on a Link instead of cache itself. As in, have a link before the http link in chain, stream whetever passes through to browser and have the same link in browser act as a terminating link if it can extract the results of the request from streamed data. I'm almost positive it is not possible because even if it responds from the streamed data in a sync result, the Observable behavior will cause an async render on first try, causing a mismatch.

Thanks again for the great work! This will be fantastic when we have suspense queries work seamlessly while streaming.

phryneas commented 11 months ago

My initial thought on this was actually to go for a link, but in a perfect world I wanted to write data to the cache before a request was made in the first place - to have that data available for useFragment in other code parts. A link can only respond, but not write to the cache on itself. So we had to go with the cache.
We will be changing this quite significantly to make useBackgroundQuery work nicely though - most logic will move into a NextSSRApolloClient that will also communicate the start of a request and simulate an ongoing request in the browser.

phryneas commented 11 months ago

As for your code snippet: I guess with the additional context I have given here you can probably see why it won't work with what we do (and want to additionally do in the future), but still, this is very cool experimentation!

Two things I want to point out from a "code review" perspective if you want to keep experimenting with this for personal use:

useMemo is not guaranteed to be stable - React might throw that away and recreate it as an optimization.
Your code has a risk of calling insertHtml multiple times if the component suspends (because of another hook) after your useAfterRehydrationContext hook - so I think you might be running risk of writing out that HTML multiple times. I'd just keep an eye out for that :)

eknkc commented 11 months ago

useMemo is not guaranteed to be stable - React might throw that away and recreate it as an optimization.

Oh, I did not realise useMemo would not be stable. Thanks a lot, switching those to manual useRef checks.

Your code has a risk of calling insertHtml multiple times if the component suspends (because of another hook) after your useAfterRehydrationContext hook - so I think you might be running risk of writing out that HTML multiple times. I'd just keep an eye out for that :)

Yeah, tried to mitigate that by using an injected list but this is too brittle cause I don't perfectly understand how ServerInsertedHTMLContext behaves. Took a look at the next source assuming it would be too much magic but it is probably straightforward enough to decipher.

I will probably wait for you to finish working on this :)

Meanwhile, I decided to manually inject some cache entries coming from ssr as context values, something like the following:

The server side just async calls apollo.query passes the result project to this context provider. Which in turn writes that to the cache and switches to watched result from useQuery. Primitive but the best I could come up with. Wonder if I could wrap useSuspenseQuery to do that automatically. As in, just do useSuspenseQuery on server side, save the result to be streamed using ServerInsertedHTMLContext and on the client side do this by restoring the same thing into cache from streamed data.

export function ProjectProvider({ project, children }: { project: ProjectFragment; children: ReactNode }) {
  const [skip, setSkip] = useState(true);

  const apollo = useApolloClient();
  const projectQuery = useQuery(ProjectDocument, { skip });

  useEffect(() => {
    if (!skip || !apollo) return;

    apollo.writeQuery({
      query: ProjectDocument,
      data: {
        project,
      },
    });

    setSkip(false);
  }, [project, apollo, skip]);

  return <Ctx.Provider value={projectQuery.data?.project ?? project}>{children}</Ctx.Provider>;
}

eknkc commented 11 months ago

Here's what I mean on the last part of my latest comment:

const SuspenseCacheSymbol = Symbol.for("UM_REHYDRATE");

declare global {
  interface Window {
    [SuspenseCacheSymbol]?: { [key: string]: { data: any; variables: any } };
  }
}

export function useSuspenseQuery<TData = any, TVariables extends OperationVariables = OperationVariables>(
  query: DocumentNode | TypedDocumentNode<TData, TVariables>,
  options?: SuspenseQueryHookOptions<TData, TVariables>
): UseSuspenseQueryResult<TData, TVariables> {
  let id = useId();

  // server side
  if (typeof window == "undefined") {
    let res = usq(query, options);

    useServerInsertedHTML(() => (
      <script
        key={id}
        dangerouslySetInnerHTML={{
          __html: `(window[Symbol.for("UM_REHYDRATE")] ??= {})["${id}"] = ${JSON.stringify({ data: res.data, variables: options?.variables })};`,
        }}
      />
    ));

    return res;
  }

  const data = window[SuspenseCacheSymbol]?.[id];
  const apollo = useApolloClient();
  const dataInjectedRef = useRef(false);

  if (!dataInjectedRef.current && data) {
    apollo.writeQuery({
      query,
      data: data.data,
      variables: data.variables,
    });
  }

  const watcher = usq(query, options);

  return watcher.data ? (watcher as any) : { data };
}

This basically does the hacky thing I did for all suspense queries. Runs by itself without any other code but I did not test it throughly. Will probably need a rehydrationcontext to avoid duplicate html inserts. Any ideas if this deserves any more experiementation or is simply junk :)?

phryneas commented 11 months ago

I'd say, personally I wouldn't go too far on this path of binding the result-cache-hydration to the hook - the request and the hook are two independent things, and their values can be transported at different points in time (and the hook value could even change, as suspense might suspend multiple times due to other hooks, and this hook value changes over time).

Especially with our work going on over on this branch, we now transport the info "a requests started" and "a request had a result" over the wire independently - both of these are pretty much out of the scope of the hooks and line up more "by accident".

phryneas commented 2 months ago

I'm doing some housekeeping so I'm closing some older issues that haven't seen activity in a while. If this is still relevant, please feel free to reopen the issue.

github-actions[bot] commented 2 months ago

Do you have any feedback for the maintainers? Please tell us by taking a one-minute survey. Your responses will help us understand Apollo Client usage and allow us to serve you better.

apollographql / apollo-client-nextjs

Questions regarding the transportValues #30