Closed lancelafontaine closed 2 years ago
My understanding is that there is no way to implement batching with hooks and suspense, at least not for hooks run in the same component. Asynchronous hooks signal to React that they need to activate the nearest suspense boundary by throwing a promise. This means that, in the actual execution of a component, only the first hook that suspends is ever "seen" — React can’t just move past the error to execute the rest of the component because it may depend on the (non-error) result of the first suspending hook.
I haven't checked, but my guess is that React will at least find all the suspending components that are siblings in a single pass, which should allow for batching across components, but it might be confusing if that same behavior doesn't work within a single component.
Thanks for the deep-dive @lancelafontaine! Super interesting to get all that context. I really like the ergonomics you propose, too.
I think @lemonmade nails it with regard to the hooks run in the same component. Thankfully, it seems unlikely that a developer would query the same data source with different queries in the same component.
However, this raises a bigger issue with how Suspense data fetching works and our decision to use react-query.
As you've unpacked, we decided to use react-query
recently instead of our own server-fetching hook. This is because we struggled with our implementation of cache across suspended render attempts.
Suspense works by throwing pending Promises
to the nearest <Suspense>
boundary. When those promises resolve, they indicate to the React renderer that it's time to try rendering again. This behavior is abstracted away from the developer and instead lives within data-fetching libraries (like react-query, react-fetch, react-fs, react-pg, etc). This is why asynchronous functions look synchronous when they're really suspending.
While I was struggling to find a good solution for an inter-render cache, react-query seemed to handle it perfectly. That's pretty much the only reason I chose to adopt it, beyond having really slick useQuery
ergonomics. It sounds like you've been diving into the source code more than I have, so it's possible you know even more about that implementation than I do at this point 😄
It's becoming more apparent that our limited use of react-query for server-side data fetching may not be the best fit for Hydrogen. We'll probably end up writing a custom hook which uses React's new (experimental) built-in Suspense cache or adopt Suspense-friendly IO helpers like react-fetch.
Gosh — maybe?
From what I understand, DataLoader
manages an internal clock while it accepts keys to fetch from consumer code. After a certain time has passed (e.g. a Node.js event loop) it decides to execute a batch request and return it to the consumer(s).
The consumers, meanwhile, are waiting on a Promise to resolve. This is where we'd need to adjust our expectations for a Suspense world: instead of returning a Promise, DataLoader would need to throw a promise.
Some notes, questions, caveats here:
Thank you both for the context! I'm still trying to wrap my mind around Suspense/React's execution model but this discussion is definitely making things a lot clearer 🙇
I think @lemonmade nails it with regard to the hooks run in the same component. Thankfully, it seems unlikely that a developer would query the same data source with different queries in the same component.
Suspense works by throwing pending Promises to the nearest
boundary. When those promises resolve, they indicate to the React renderer that it's time to try rendering again.
Gotcha! This was a great explanation, thanks.
I could imagine one given component making two queries to two separate services within one component. With this existing throwing-promises mechanism, it seems like those couldn't even be run in parallel (unless I'm misunderstanding).
I haven't checked, but my guess is that React will at least find all the suspending components that are siblings in a single pass, which should allow for batching across components
Yep, this would be a good opportunity to batch in "two dimensions":
useBatchedGraphQLQuery
) across sibling components. This would allow for a future in which each component could be responsible for encapsulating the data that needs to be fetched to be properly rendered, without incurring the cost of an additional query per component. useBatchedRecordsQuery
) across sibling components. I could imagine this being useful for many identical components with the same selection sets but for different record being rendered one the same page (eg. one card per article in a blog, one card per product in a collection, etc)This is where we'd need to adjust our expectations for a Suspense world: instead of returning a Promise, DataLoader would need to throw a promise.
I haven't had the chance to look into the mechanics of this yet, but I'd expect that for batching to occur, these hooks would need to throw a Promise wrapping a literal value that only describes the GraphQL request that would need to be made to fulfill that data (without actually making the fetch
calls), and that all dataloading would have to occur at the Suspense boundary where all thrown promises are aggregated and combined into one batched request. Definitely sounds interesting, although I wouldn't be surprised if we needed to dig into React's internals to tweak this behaviour.
E.g. if a server component loads a child server component, the child will not be included in the batch (because the parent will be suspended).
I'll spend more time thinking about this one, but my gut reaction is that without having multiple passes, you're probably right that data fetching would only be applicable for sibling components and not child components 🤔 it could be worth investigating and trying to determine whether encouraging a pattern which can't fully be batched due to these constraints would result is a better experience than our current data-fetching hoisting-at-the-highest-level pattern.
Last thought: any idea whether any of this changes in an RSC world? or any indication as to what data fetching might look like then? Mostly asking because the few resources I could find seem to indicate that this sort of Suspense-bound scheduling might not be necessary then, and can just rely on async libraries that could possibly integrate dataloaders more simply (eg. db.notes.get(props.id)
in the official RSC RFC, that RFC even mentions Waterfalls are still not ideal on the server, so we will provide an API to preload data requests as an optimization.
)
I could imagine one given component making two queries to two separate services within one component. With this existing throwing-promises mechanism, it seems like those couldn't even be run in parallel (unless I'm misunderstanding).
Yeah agreed that they can't run in parallel.
Last thought: any idea whether any of this changes in an RSC world?
Nothing really changes in an RSC world. React 18 introduces streaming SSR, and RSC is a somewhat separate concept of server/client component separation which still leverages the streaming SSR added in React 18.
FWIW, the db.notes.get(props.id)
in the RSC RFC is likely alluding to something like react-pg which is Suspense-based (throws Promises until it resolves).
I hadn't heard of the preload optimization you pulled out. I'll ask the React team about that!
Update: I asked about preloading and got an in-depth answer from Sebastian!
There's no automatic framework that does this for you but all the React I/O APIs should expose a preload() method.
E.g. react-fetch has a fetch(...)
API and a preload(...) API
.
https://github.com/facebook/react/blob/main/packages/react-fetch/src/ReactFetchNode.js#L226
The principle is that most data fetching should be done in two steps:
A preload just starts downloading the thing where as a read (e.g. fetch(...)
) blocks and returns the actual result.
To achieve optimal performance it's often necessary to start the preload as early as you think you might need the data. I.e. it's better to fetch data you won't actually need before you need it to lower the time spent waiting on it. However, if you do that for everything you might flood the network and spend too high cost. This is not a problem that can be directly automated - since it's a constraint solving problem and the ideal solution likely includes statistics. So there's a research/framework space to make this ideal.
However, in simple cases it's easy to just add manual preload.
Often that can be in a component closer to the root rather than deep in the tree. Like if you always is going to need some piece of data, you might as well start loading it as early as possible. However, it would be unnecessary to block on it until deeper when you really need it.
Since preloads doesn't block - you can parallelize by calling multiple preloads.
function App() {
preload('a.json');
preload('b.json');
preload('c.json');
return <Stuff><Component /></Stuff>;
}
function Component() {
let a = fetch('a.json').json();
let b = fetch('b.json').json();
let c = fetch('c.json').json();
return <div>{a + b + c}</div>;
}
Sometimes it might not work and you have to put them in the same component but you should still use the same pattern:
function Component() {
let items = fetch('items.json');
for (let item of items) preload(item.url);
let children = items.map(item => {
let data = fetch(item.url).json();
return <Item data={data} />;
});
return <div>{children}</div>;
}
It might be tempting to try to unify them into some kind of Promise.all
things but that pattern doesn't scale up to help you preload earlier in a completely different component. In fact, IMO, once you get your head around thinking this new way Promise.all
starts looking kind of ridiculous.
Even in the example I used it's often better to do the read within each item rather than in the parent. That allows React to also parallelize the rendering of children and not block on all of them loading before it can start work on either.
function Component() {
let items = fetch('items.json');
for (let item of items) preload(item.url);
let children = items.map(item => {
return <Item url={url} />;
});
return <div>{children}</div>;
}
function Item({url}) {
let data = fetch(url).json();
// ...
}
I hadn't even considered preload
in our world yet. I wonder if we can leverage this aspect in both 3P queries but also 1P. Unfortunately, tossing all of our 3P queries into a useQuery
type function makes this difficult.
There's also considerations for interacting with a subrequest cache and whether this method is compatible.
Pinging @igrigorik for visibility!
First, perhaps as an obvious disclaimer: preloading and batching are entirely different optimizations. I suspect we should spin out preload discussion into a separate thread. With that as context...
Eager-fetch of critical 1P+3P data is definitely something we should explore. Unlocking this requires either an explicit signal from the developer that data fetch that might be nested in a component tree should be done early, or automated smarts to learn / heuristically determine these optimizations.
Naively, using Suspense already gets you pretty far by eliminating blocking fetch-and-render-before-proceed boundaries, allowing fetches to be kicked off and streamed in parallel. There might still be room for "start it earlier" but it's not clear how much that will unlock.. As an idea, perhaps one thing we can experiment is adding some for of preload: true
option to useQuery which could then leverage via static analysis? Plumbing this knowledge into the runtime is another matter though.
In short, I propose we open a new RFC/discussion thread on preload to explore this space, but my intuition is that proper use of Suspense will already get you pretty far and we should first see & gather some telemetry here.
Update to interested parties: we shipped preloading support in https://github.com/Shopify/hydrogen/pull/700 using preload: true
and preload: '*'
🎉
Additional work could be done to analyze and combine GraphQL queries in a way that doesn't create an even slower or more complex resulting query.
For now, we should see how this works for our use case.
👋 Hi! I've recently been looking in how we can best support third-party integrations in Hydrogen. While Hydrogen is typically focused on serving components from our own first-party storefront API, most headless merchants still end up needing to integrate with third-party services. I've been looking into how we can be as opinionated as possible when it comes to this integration so that we can offer much better performance & ergonomics than manually
fetch
ing an external API on the client and massaging all the data yourself.We've been documenting some thoughts about how we can best achieve that over here, but ultimately we think we can add the most value to fetching third-party data with three things: caching, batching, and mapping. Why these three strategies?
This idea in the form of a code example illustrates what the ergonomics of all server fetches in Hydrogen could look like if you squinted:
There's even a "working" prototype with comments for something similar if you want do just dive right in! But in trying to implement this, a few complications arose, and they mostly center around Hydrogen's use of the
react-query
library for theuseQuery
hook.We're currently using
react-query
(and previouslyuseServerFetch
) to wrap & schedule promises for execution so that we can render React components with the data they need on the server. Rendering React components asynchronously isn't supported, so the need of these sorts of hooks for fetching data on the server before rendering definitely seems reasonable.Some of the reasons we originally adopted
react-query
was that it provided a more robust implementation for server-side requests that work with Suspense, that it works on the client, and that it comes with a built-in cache. However, some of the now-known drawbacks of adopting it as a library are that it doesn't support async fetches, and that the headers and other information about the response aren't being exposed.One more drawback that the map+cache+batch prototype uncovered is that
useQuery
absolutely won't allow for batching GraphQL requests, or delaying any async operation across hook invocations for that matter.react-query
includes its own scheduler to resolve promises as eagerly as possible and offers no configurability for deferring execution across hooks. If proper batching support were to be considered a priority, some of the options we have would be to either try to introduce batching support upstream in a library that wasn't designed for it, maintain our fork ofreact-query
with batching support, or return to writing our own implementation à lauseServerFetch
.Although things sound dire, being in control of our own custom hook for server fetches (again) could be beneficial for us:
react-query
has it's own separate layer of caching that could stand to be consolidated with the rest of our caching framework for simplicity & debuggability.I think it could be interesting to entertain the idea of building our own server-side data-fetching hook. The problem has been solved before:
react-frontload
is a tiny library that could server as inspiration for building one of these pseudo-synchronous hooks. Looking for feedback as to whether folks think about something like this!All this being said, as we try to converge with RSC, it's unclear whether building a hook would be necessary for server fetches in the future, or if just a general data-fetching library will suffice.
Although data-fetching in an RSC world doesn't seem to have been finalized, all examples point to the ability to make use of asynchronous libraries for data-fetching in server components (eg.
db.notes.get(props.id)
in the official RFC, and a new React server renderer that supports Suspense and waiting on asynchronous data on the server without double-rendering).I'm relatively new to the world of React so I may be overlooking a couple things. Looking for some feedback on the general appetite for replacing
useReactQuery
, and also hoping this issue sparks some discussion about our server-side data-fetching patterns, both for now and in the future with RSC.