Constrain prefetch to align with FHIR's DataRequirements

kpshek commented 7 years ago

In discussion with @brynrhodes and @isaacvetter regarding aligning Clinical Reasoning and CDS Hooks, one of the topics of discussion was around how both projects handle the concept of prefetching data.

Prefetching aims to address the following:

Allows the EHR to satisfy data that it may have already (eg, in memory)
Allows the EHR to calculate an optimal set of queries necessary to satisfy data across multiple CDS Services

In Clinical Reasoning, this concept is defined in the ServiceDefinition.dataRequirement using the DataRequirement type. Compared to our prefetch definition, DataRequirement is a much more constrained model. Based upon @brynrhodes implementation experience, the constraints of DataRequirements allow for the calculation of an optimal set of FHIR queries to satisfy multiple DataRequirements. Our current prefetch definition (any FHIR query) would not actually allow EHRs to achieve our 2nd goal above.

To help with this discussion, it would be good to take an inventory on the current prefetch FHIR queries for a variety of CDS Services that have been prototyped at Connectathons. With their feedback, we can see if constraining prefetch to the functionality of DataRequirements would still meet their needs.

jmandel commented 7 years ago

@kpshek can you explain a bit about what you mean by "calculation of an optimal set of FHIR queries"? Is this about generating raw/internal database queries to retrieve the data you need? In general, the more capabilities you allow the harder this gets; but for simpler search strings (e.g. Observation?patient={{Patient.id}} you could always optimize these directly (by recognizing them explicitly as having some structure that you know how to handle well).

Obviously this is more work on the EHR side, but I'd imagine a couple of common cases would handle a good fraction of the load, while still giving service developers the flexibility to pre-fetch on more expressive REST API when needed. And if you see some pre-fetch requests that are "too hard" (whatever that might mean), you can always leave them out and allow the client to fetch directly.

All that said: I agree that it would be good to take stock of what queries people are actually running. On my list so far are:

patient (read)
search for most recent set of bilirubin labs for a specific patient
search for all med orders + statements for a patient
search for most recent height + weight for a patient
search for all allergies for a patient
practitioner (read)

None of these is super complicated. But looking at https://www.hl7.org/fhir/metadatatypes.html#datarequirement, it's not so straightforward either. It actually appears to enable many kinds of queries that the FHIR REST API search protocol does not (e.g. arbitrary dotted-path filters like Observation.component[2].code, where the REST API enumerates a specific list of search parameters).

niquola commented 7 years ago

Isn't it similar to what GraphDefinition is trying to solve?

jmandel commented 7 years ago

In principle, yes. This is definitely worth keeping an eye on, and we can make sure to share the CDS Hooks use cases as examples for GraphDefinition. Some concerns for the moment include:

The STU3 GraphDefintion doesn't support backwards links yet (so starting from a patient there's no way to traverse outward to allergies, labs, etc).
There's not yet a clear story for filtering labs by code, or meds by drug, etc
Slicing and Profiling aren't readily supported in today's EHR FHIR implementations

kpshek commented 7 years ago

@jmandel - Yes, that is what I meant by "calculation of an optimal set of FHIR queries". Under our current model, we allow for any FHIR search string which, as you know, can result in some very complex queries (chained parameters, reverse chaining, contained, etc). I do not think any EHR can achieve our stated goal of calculating an optimal set of of database queries given an arbitrary set of prefetch queries.

Certainly, this can be accomplished if the prefetch queries are sufficiently simple (as in the case of the Observation?patient={{Patient.id}} query you mentioned).

Obviously this is more work on the EHR side, but I'd imagine a couple of common cases would handle a good fraction of the load, while still giving service developers the flexibility to pre-fetch on more expressive REST API when needed. And if you see some pre-fetch requests that are "too hard" (whatever that might mean), you can always leave them out and allow the client to fetch directly.

I agree with this. However, if the purpose of prefetch is to allow the EHR to either return data it already holds cheaply (eg, memory) or optimally fetch it, why start with an API that allows for complex queries which I contend will always be 'left out' and on the client to fetch directly.

None of these is super complicated. But looking at https://www.hl7.org/fhir/metadatatypes.html#datarequirement, it's not so straightforward either.

Agreed. Perhaps @brynrhodes can specify some examples to help us all understand this model better?

I've created this new wiki page to solicit feedback from the community on what prefetch queries are being used today (or what plan on being used). Hopefully this will provide use with some additional context that may help us decide if/what changes are made in this space.

jmandel commented 7 years ago

However, if the purpose of prefetch is to allow the EHR to either return data it already holds cheaply (eg, memory) or optimally fetch it

Even if the EHR does no better than to query its own database the same way it would for a rest call, it at least saves a round trip. That was my initial motivation here.

grahamegrieve commented 7 years ago

GraphDefinition will need to handle backwards links. It allows for slicing/profiling, but that's not mandatory (unless the problem makes it mandatory, which is a problem anyway). One of the advantages of graphdefinition is that the EHR can define a fixed number that they implement optimally, and let the client choose one.

jmandel commented 7 years ago

@grahamegrieve does GraphDefinition contemplate a way to filter which links are followed (e.g. from Patient --> "all MedicationRequests from the past 12 month prescribed by Dr. Jones")? From the current design, it looks like this kind of filtering would be profile-based; is the thought to add some other constraint specification language (like FHIRPath, search params, DataRequirement-style filters, or something new)?

brynrhodes commented 7 years ago

DataRequirement is more about defining the leaves of a graph, where the relationships between those leaves would be specified some other way (in CQL in a measure or decision support artifact). I've been thinking about a transformation between CQL and a GraphDefinition, seems like it would be straightforward, but haven't had time to get into the details. In general, the data requirements would still be computable from the leaves, and you could probably get some economy based on the relationships as well.

grahamegrieve commented 7 years ago

we weren't contemplating this... but it might be a good idea. And it would be good to not only be able to do it by profile. But there has to be a limit somewhere.

brynrhodes commented 7 years ago

It actually appears to enable many kinds of queries that the FHIR REST API search protocol does not (e.g. arbitrary dotted-path filters like Observation.component[2].code, where the REST API enumerates a specific list of search parameters).

Yes, DataRequirements allows you to describe queries in terms of general paths, that's definitely an area that needs to be worked out. What we do right now is map the path to a search param, in the best case that works fine because it's one-to-one; anything beyond that needs to be addressed by the data access layer (either by augmenting or subsetting the search results).

GraphDefinition works the same way, but has the advantage of being behind the service API. Where CQL can execute behind the API, it can use the paths directly, but when it's in front of it, there has to be a mapping.

brynrhodes commented 7 years ago

As far as examples, the Library examples have quite a few, here's a typical example.

kpshek commented 7 years ago

However, if the purpose of prefetch is to allow the EHR to either return data it already holds cheaply (eg, memory) or optimally fetch it

Even if the EHR does no better than to query its own database the same way it would for a rest call, it at least saves a round trip. That was my initial motivation here.

While we remove the latency between the CDS Service and the FHIR server (since we're going from the EHR to FHIR Server), the performance impact to the user is actually worse in many cases. This is because the EHR is blocking on calling the CDS Services until the prefetching is finished. As such, the user is waiting longer for the initial cards to be returned.

jmandel commented 7 years ago

In practice, there are lots of opportunities for optimization — e.g. the EHR can preemptively compile the details required for a MedicationPrescription hook as soon as the prescribing pad is opwn, even before the user has entered a drug (so they can be ready the instant the EHR needs to otrigger the hook).

kpshek commented 7 years ago

Absolutely!

My apologies if my previous comments are being construed as questioning the value of prefetch as that is not my intention. I'm just trying to impress that I believe our current prefetch model is difficult to implement by EHRs in order to achieve our higher-order goals.

Conversely, our current prefetch model is very simple for CDS Service providers to implement which I love and would like to maintain with any changes that may be introduced.

jmandel commented 7 years ago

This discussion is spot-on @kpshek. I agree with the question, and I like the idea of constraints that enable :-) Just want to make sure they have reasonable ergonomics, too. Saying "we support search-style URLs but the only allowed parameters are patient, code, and date" might be worth considering, too.

kpshek commented 7 years ago

Saying "we we support search-style URLs but the only allowed parameters are patient, code, and date" might be worth considering, too.

👍

travisstenerson commented 7 years ago

Our intention for prefetch is to request all the procedures done on a patient for the purposes of the (cancer) disease in question, so we care about ProcedureRequest.reasonReference more so than particular codes or dates. We can't specify that reference in a DataRequirement, though we would like to. As well, Observations/Reports that result from those procedures (basedOn or result). We could filter by codes and/or dates, but the code list would be pretty lengthy. I suppose I'm asking for either suggestions, or to suggest that implementers be able to specify references in a prefetch.

kpshek commented 7 years ago

Thanks for your use cases @travisstenerson! Would you mind documenting those use cases as concrete FHIR queries on this wiki page here?

travisstenerson commented 7 years ago

I will when I'm able to @kpshek, still encoding the source material, it's rather extensive. Although now I see that the search parameters for Procedure and ProcedureRequest don't actually allow for searching that reference. I'll have to read more to see how I hope to grab this data. I will definitely add to the prefetch page when I have a set of queries.

dmccallie commented 7 years ago

Given the complexity of data needed for some CDS services, it might be necessary to expand this notion of "prefetch" to include additional approaches. For example, some CDS services may need a more or less continuous feed of selected data elements, such as via a background HL7 feed or (someday) a FHIR Subscription service. And to avoid delays at crucial points in the workflow, perhaps some prefetch triggers could fire "upstream" of the critical decision? For example, perhaps a chart-open hook could allow the service to fetch the data expected to be needed later on during CPOE. Is it possible to use Hooks that way - as a "prefetch only" service, allowing for background fetching of complex data anticipated for later use?

jmandel commented 7 years ago

Triggering a hook just to obtain data (in its own right, or to apply at a near-future time when a subsequent hook fires) is certainly an interesting and valid pattern, @dmccallie (though I don't think it's something we've documented or thought about in any detail).

Some of the other ideas, like establishing access to an HL7v2 feed (or FHIR subscription) are quite powerful and, to my mind, belong in a more comprehensive writeup like "Applying CDS Hooks within a real-world enterprise environment" – this would ideally be a write-up of some actual experience, calling out helpful patterns for combining CDS Hooks triggers with other technologies (but ideally we wouldn't need to specify any new standards in CDS Hooks for, say, how to establish access to an HL7 v2 feed). I could totally see this being a very important stopgap, with CDS Hooks providing the "just-in-time" trigger on top of a rich set of other data interfaces.

dmccallie commented 7 years ago

@jmandel, thanks. I didn't mean to imply that v2HL7 feeds or FHIR Subscriptions need to become part of the CDS Hooks specification. But I did mean to hint that in some cases, making the pre-fetch logic too complicated might be solving the problem at the wrong level. If a service need lots of complex data, then perhaps pre-fetch is not the best way to get it. "Pre-hook," or consider a background feed instead.

vlindhol commented 7 years ago

I am using CDS Hooks for a CDS that takes "everything" about a Patient (observations, conditions, procedures, medications, allergies). I am using prefetch in a very simple way: just get the previously mentioned resources filtered by Patient.id and code systems we support. This can be done both by prefetch and (afaik) ServiceDefinition.dataRequirement.

On thing I am missing, however, is the ability to provide a "filter list". In the environments our CDS operates in, getting for example every Observation for a patient may result in data that is tens or even hundreds of megabytes. So to save bandwidth we need to constrain the results.

I am following the lastn operator with great interest, although I would like even more granularity. In our current non-FHIR interface we filter the results client-side for a lot of observation codes both by date and number, i.e. for a specific code we get e.g. the observations done within 2 years and constrain their number to five. These filters are defined per code, since the number of clinically relevant observations vary. There is also a fallback default filter.

I'm not sure how to formally represent such a filter list in FHIR, prefetch and dataRequirement both seem insufficient. I'm guessing it would be easier to add that kind of functionality to dataRequirement, while prefetch would require support for doing multiple searches that would, in the end, produce one bundle.

brynrhodes commented 7 years ago

For DataRequirement specifically, we've had that request before. The current design specifically excludes limiting by number of results because it adds another type of operation that the data access layer has to implement. However, given the number of times it's been requested, and the clear usefulness of limiting the number of results, it seems like it's probably time to add it.

I've submitted a tracker for this here.

cds-hooks / docs

Constrain prefetch to align with FHIR's DataRequirements #47