dhis2 / app-runtime

A singular runtime dependency for applications on the DHIS2 platform
https://runtime.dhis2.nu
15 stars 13 forks source link

Suggestions for improving data fetching #1353

Open Birkbjo opened 1 year ago

Birkbjo commented 1 year ago

Motivation

useDataQuery is great for simple use cases, especially for queries that are independent and fired only once. It's also incredibly easy to fire a request in an app-platform app, due to no setup required. However, it's not very flexible. We're using react-query under the hood, but do not expose most of the options.

I will try to explain some of the issues I've encountered while working with useDataQuery, and some suggestions for how we can improve it. There's a TLDR with suggestions at the end.

Query-structure is static

The query-objects passed to useDataQuery is static, and cannot be changed. To facilitate "dynamic" fields and parameters, we support having a callback in the object. However, this is not supported for resource, which in my opinion is quite an arbitrary limitation, and there's even an open issue from a third party dev about this. In many cases a static resource is fine, but not if you try to make generic components. There's actually a hacky workaround to this (passing resource: '/', and the actual resource endpoint as part of the dynamic id: id: (id) => `${resource}/id). Nevertheless, I think this callback in the query-object is confusing, especially to new devs. I understand why it's done, because useDataQuery was implemented first without react-query, and thus didn't have deterministic hashing of the query. But since then we've moved to use react-query under the hood, and I think we should leverage the functionality it provides. It's been argued that static-query-object makes it possible to statically analyse the code and potentially "hoist" requests to eg. the "page"-level. I think the theory behind this is interesting, but for one, I don't think it's something we will ever do in practice. Second, I think this is something the app should have more control over, and eg. do in the router (see react-router loader). Third, if we really wanted to do this - it should still be possible to identify queries that are indeed static (we would still need to do this today, due to dynamic params). Forth; in my opinion it sacrifices too much in terms of DX and limitations for a theoretical use-case.

Query-objects might be perfect as query-keys in react-query

react-query has a concept of query-keys, which is a way to uniquely identify a query. It can be any object, and it's deterministically hashed. Since our query-object uniquely identifies a query, it's perfect for a query-key. We could easily simplify our query-objects and remove the callbacks from params and ids - and by using these as keys directly, it would support dynamic parameters everywhere - and react-query would handle refetching if any of the keys in the object change.

Refetch is not a substitute for reactive query-keys

Another common use-case is to refetch a query whenever a dependent variable changes. Since the query-object is static, it's obviously not reactive - and thus the only way to send another request with different parameters is to use refetch. This forces consumers into fetching imperatively (call refetch from useEffect, instead of declaratively (refetch when dependencies change). In our API, refetch is responsible both for "force refetch identical data", and "refetch when variables change". refetch in react-query is intended to imperatively force an update of the data - not to be used as the main mechanism to refetch with new variables. With our API there's no way to differentiate between these use-cases, which cause duplicate requests in certain situations (see below).

In the latest react-docs updates, the team is discouraging the use of useEffect. See you might not need an effect and separating event from effects.

In the new docs, the team is adamant that we cannot ignore any dependencies in useEffect, which has been useful if you depend on a certain value, but don't want the effect to run every time that value changes. This is one of the problems by depending on useEffect for refetching - there's no way to choose which dependencies that should be reactive.

Of course, there has to be a useEffect somewhere, since a request is an effect. So react-query "hides" the useEffect, and thus you really don't need to use useEffect that often. This is one of the reasons why the react community and documentation encourages you think it through whenever you use an useEffect. It's not great DX to have to handle useEffect in user-land for fetching data.

From react-devs:

Keep in mind that modern frameworks provide more efficient built-in data fetching mechanisms than writing Effects directly in your components.

I think we should leverage this.

Refetch in useEffect causes duplicate requests

react-query's refetch ignores the usual query-key checking, and always does a refetch - which makes sense when used in react-query. However, since we're using refetch as a main way to refetch data - I've seen a lot of duplicate requests. This will happen unless you have a isFirstRender-check in the component that calls refetch from useEffect (or use lazy: true).

I will show an example of what I mean.

const query = { 
    result: {
        resource: `/dataElements`,
        params: ({ page, filter}) => ({
              page,
              filter,
          }),
    }
}
const MyComponent = () => {
    const [{ filter, page }, setQueryParams] = useQueryParams();
    const { data, refetch } = useDataQuery(query);

    useEffect(() => {
        refetch({ filter, page }); // will send a request even if filter and page are the same as previous request
    }, [refetch, filter, page]);

    return <div>{data}</div>;
};

The useEffect will be called on mount, and thus you will have a duplicate request. You could of course anticipate this, and pass lazy: true.

This totally works, but it's something you have to be aware of when using useDataQuery. There's also a fair amount of code, compared to using react-query directly:

const [{ filter, page }] = useQueryParams();
const query = { // assume simplified query-object
    resource: 'dataElements',
    params: { page, filter })
};
const { data } = useQuery({ queryKey: [query] });

Multiple queries in the same query-object

Multiple queries in the same query-object can be powerful, and it's incredibly easy to send requests in parallel. However, due to engine.query() using Promise.all, the resulting loading (and error-prop) are tied to the Promise.all-promise (from react-querys perspective it's really just one query). It's not really suited to eg. load a paginated list of two different resources. If you refetch one, the other list will also go in a loading state. See my example here: multiple resources in one query.

If we want to support multiple queries on the engine-level, I'd argue we should support engine.queries(queries: ResourceQuery[]). Or just use useQueries from react-query in user-land. This would let us get query-status for each query. With regard to this, I would also argue we should consider simplifying query-objects to be the ResourceQuery - eg. instead of

{
 const query = { // instead of this
    dataElements: {
      resource: 'resource'
      params: {
        fields: ['id', 'name']
      }
    }
  }
}
// do this
const dataElementQuery = {
  resource: 'resource'
  params: {
    fields: ['id', 'name']
  }
}

I realize this would be quite the breaking change. But it makes typing the query-objects easier, and it's more in line with how react-query works. I would also argue the vast majority of requests do not use multiple resources in the same query-object. And typing the resulting object data.dataElements.dataElements isn't very ergonomic - the amount of times I've forgotten the extra dataElements are not insignificant.

We could potentially support both for some time to be backwards compatible. Eg.

public query(query: Query | ResourceQuery, options: QueryExecuteOptions)

Or even make it a permanent feature - it would be easy to check if it's a Query or ResourceQuery (`typeof query.resource === 'string'), but it would complicate the implementation and would need function overloads.

We might not need useDataQuery

The actual DHIS2-specific fetching logic lives in engine.query. useDataQuery really doesn't do much; it's basically a thin wrapper around react-query useQuery, with engine.query as the fetcher-function. Again - the original implementation was not built with react-query in mind, so this is understandable. But in the current state I believe it would be beneficial to just use react-query directly, and use engine.query as the fetcher-function. We've done this quite successfully in eg. the aggregate-data-entry-app.

engine.query is great, and handles all the annoying parts about data-fetching; handling errors, constructing the url, encoding query-parameters, etc. I think that is the right level of abstraction for an API. The frontend-world moves too quickly for us to reliably keep up with all the modern tool that we and third-party developers want to adopt. And thus, I think it makes sense to provide a simple, agnostic API for data-fetching (which we already do), and let the client orchestrate state binding to whatever framework they use. This is not to say that we shouldn't provide common hooks, but allow the consumer some freedom of what to use.

There's nothing really stopping you from using react-query with engine.query right now - which has been proved in the aggregate-data-entry-app. The queryClient is also already setup in the app-shell, so it's really easy to use. A simple solution could be "ok, use react-query directly if you want". And we could keep using useDataQuery for simple use-cases. However, I want to verify what path we should take - so we agree that this is OK to do. Because right now the fact that we use react-query is an implementation detail, and from an API point of view, we shouldn't just assume it's there. The benefit of this is we would use the same cache as the header-bar and the query-client. But we would have to document the fact that there's a queryClient-provider in the tree, and it's available for use.

I've also worked on mapping query-objects to full model types (example here). By returning these types from engine.query, while using useQuery directly, the result would be correctly typed and inferred from the engine.query() call.

Combined with simplified queries as mentioned earlier, this allows us to use all the functionality of react-query, while still having a common, declarative representation of the query/request.

Depending on if we want to be "framework agnostic" or "make it as easy as possible", we could either drop react-query entirely from app-runtime, and just expose engine.query - and have examples of usages with react-query. Or we could continue to setup a default QueryClient and render the provider - so the client could use react-query without any setup. In the latter case, we could make this backwards compatible, by still supporting useDataQuery (and maybe deprecate it). But officially support the use of react-query directly. Another alternative would be to keep useDataQuery(query), but simply pass all options to react-query. Something like this:

type UseDataQueryOptions = Omit<UseQueryOptions, 'queryFn' | 'queryKey' >
function useDataQuery(query: ResourceQuery, options: UseDataQueryOptions) {
return useQuery({ queryKey: [query], ...options} )
}

Typing query-objects

I have managed to "parse" query-objects to types. However there will always be limitations of what we can type from an object like this. Currently, I'm just mapping eg. a resource like dataElements, to the matching model type. If we were to "simplify" our query-objects, we may have to think about the role of id-in the object. If we were to make everything inside the query "dynamic", there would be nothing stopping you from doing something like this: resource: `dataSets/${dataSetId}/organisationUnits/gist` - and we wouldn't be able to type that. However, you would probably only be able to pass in "known" resources to engine.query (when using typescript), and would get a type-error if doing that, which could be limiting. So I believe the right way would be doing something like this:

resource: 'dataSets'
id: `${dataSetId}/organisationUnits/gist` // are we ok with this?

Above example is quite unique to the gist-API though. Typing this would be pretty hard - but I think it would be ok to return type as unknown in cases like this - and let the app pass in the type. We could also support gist: boolean in ResourceQuery, that can be used to use the gist API.

I've lots of thoughts about this, but I will end it here so it doesn't grow more out of scope, but it's definitely something to keep in mind while we're revisiting this.

TLDR; Suggestions #

All in all, I think these changes would:

ismay commented 1 year ago

I agree with the gist of the proposal, especially:

Plus I completely agree with your assessment of the results:

I think there are nuances that we'll probably have to debate, but in my opinion this would be a welcome improvement.