NIH-NCPI / ncpi-fhir-ig

🔥 NCPI FHIR Implementation Guide
https://nih-ncpi.github.io/ncpi-fhir-ig/
Creative Commons Zero v1.0 Universal
5 stars 2 forks source link

Search fields #38

Closed bwalsh closed 2 years ago

bwalsh commented 2 years ago

This PR addresses the following use cases

Use cases:

FHIR does not support joins in the traditional RDBMS sense. When querying data entities that are related to each other, one way to simulate a join is to search one entity under one another (i.e. Bundle of related documents). This is similar to materialized view in the relational database world. One weakness to the FHIR is it’s verbosity, the number of calls that one needs to make searching for resources, examining their contents and the querying again for related documents. This is compounded in the research setting, especially for cross cohort builders, since FHIR repositories are segregated by accession and data use restriction; queries must be re-run on multiple endpoints. In the AnVIL use case, we are likely to see 100+ endpoints.

This seems to be more pronounced for early detection datasets. The cardinality/counts differ from public datasets. Where public datasets typically have 1:1:f where f < 10 subject:specimen:file. Early detection datasets have relatively few subjects, but a 10s of samples, and many assays with tons of files underneath them.

FHIR addresses inefficiency in several ways, fundamentally using (search)[https://www.hl7.org/fhir/search.html#revinclude] parameters.

Clients may request that the engine return resources related to the search results, in order to reduce the overall network delay of repeated retrievals of related resources.

Ideally, the user should be able to specify a Resource type and an identifier (aka natural key, submitter id) and retrieve pertinent descendants and ancestors with a minimum of ambiguity and a maximum of efficiency. In addition, we should ensure the semantics of the relationships are as clear as possible. We should also strive to remain as close to the letter and intent of the base FHIR without introducing an excess of special cases.

If we have confidence that the search parameter fields are uniformly populated, we can build constructions such as Composition and GraphDefinitions.

This PR ensures that data submitters populate search parameter fields.

The following are identified as typical entrypoints into the FHIR graph and apply to ad-hoc, etl, analysis and workflow use cases:

A tightly defined, dependable bundle of these core resources will enable “top down” queries from the ResearchStudy or “bottom up” queries from the Observation or other downstream resources such as Gene. For the Resources mentioned above, the solution is straightforward; ensure the search fields are populated, by defining Profiles.

One exception is the relationship between DocumentReference and Task. A ResearchTask links the the specimen,specified in input, to the files, specified in output. Semantically, the natural field to link the DocumentReference to the task is author, however the base reference restricts that field to Reference(Practitioner | PractitionerRole | Organization). The base reference does include a context that we leverage, with some semantic ambiguity.

This PR could be extended to ensure that Patients had a managingOrganization corresponding 1:1 with a ResearchStudy. A enumerated Group has been mentioned as well.

image

bwalsh commented 2 years ago

@liberaliscomputing @RobertJCarroll @torstees

When you get a chance, can you comment on this PR? I've written some tests within the anvil project that confirm boundaries of Google's $validate method that reference this work.