torstees commented 3 years ago

Problem to be resolved

How best should we represent key findings from sequence analyses? The solution should be comprehensive enough to support variations.

Ideal solution

Work is ongoing to finalize an official set of profiles, extensions and guidelines to be integrated into the base FHIR model. However, that work is under active development and still in early stages. There is, however, clear guidance available on the current model envisioned by the working group.

According to the current spec, there are a number of key profiles that would be of interest to us for representing CMG's Discovery data. I'll lightly touch only on those elements, but there are other pieces available for those needs that extend beyond the needs of this particular set of data

Variant profile (Observation)

The variant profile represents the bulk of the needs for CMG. At the topmost level, it points to the Specimen from which the sequence data was drawn. The majority of information, however, is stored within the component property array.

For CMG, the following data points will be stored as part of the component array:

Gene
Chromosome/Pos (stored as separate entries)
Ref/Alt Alleles (again, stored separately)
Zygosity
Genome Assembly
Transcript
HGVS.c/p (separate)
SV change type (IN/DEL,Inversion,etc)
Significance

InheritedDiseasePathogenicity (Observation)

This profile is used to provide additional details regarding the pathogenicity associated with known variations. For our needs, this is mainly to capture the inheritance model associated with the variation (autosomal recessive/dominant, x-linked, etc)

IDP entities point back to the relevant variant by way of the derivedFrom property, which is a reference to the respective Observation id

GenomicsReport (DiagnosticReport)

This report acts a bit as a compilation of all relevant discoveries associated with a given patient. At the top level, it points to the related Patient and then all "discoveries" are stored as references in the result property array.

Current Working Solution

Because the genomics-reporting work is in the early stages, I recommend employing the working group's guidance, but using base profiles rather than the profiles which represent the current work in progress and may change along the way. This reduces effort associated with maintaining the in-development profiles and extensions within our model, since those are quite complex and likely to change before final ratification.

torstees commented 3 years ago

An example of a variant can be seen here Examples for IDP and the report to follow soon.

For the CMG data, I had to acquire gene IDs from the web using an API in order to conform to the recommended guidelines for code format. The module I wrote to do that will be included with the KF ingest plugin once I've pushed the changes up.

torstees commented 3 years ago

An example of a diagnostic implication can be seen here.

Please note that this relies on some pre-release LOINC codes, which are represented in a CodeSystem, tbd-codes. This CS and VS will have to be added to our model should we follow the guidelines for identifying some of the features that I am currently using.

torstees commented 3 years ago

An example of the report can be seen here.

Please note that there are two references in the "result" array, since there are two types of reports to be aggregated together. There actually could be quite a few, and because they are not limited to the sample, they could span multiple samples.

torstees commented 3 years ago

The relevant files for the kf ingest library fhir plugin can be found in my ingest repo Relevant files are:

Variant Discovery Implication Discover Report

torstees commented 3 years ago

Something that may be worth discussing is whether or not there is a need to profile these. As it is, I was able to get by using only the base profiles along with a few terminologies provided in the current version of the Genomics Reporting but, like a few of our others, we could create a simple profile to clarify the purpose of the different Observations/Diagnostic Reports are being used.

bwalsh commented 3 years ago

I wonder how we might support provenance? Could GenomicsReport::DiagnosticReport.extension:SupportingInfo or other property could contain references back to :

AnnotationTask
DocumentReference (cram,bam,vcf)

torstees commented 3 years ago

SupportingInfo would work, but those two aren't really supporting info, are they? Those seem more like SourceData or something like that. There is a ServiceRequest extension, which doesn't apply particularly well to us, but we could have something similar pointing to the task and possibly another extension, similar to SupportingInfo referencing where the source data for the finding can be found.

lizamos commented 3 years ago

Hi! Sorry to jump in and also be late to the game. My name is Liz Amos and I'm at NIH (NLM) but also am a member of the HL7 Clinical Genomics group. I'd be happy to help sort through the CG FHIR IG - it's changing significantly in the next version (see the current build: http://build.fhir.org/ig/HL7/genomics-reporting/index.html). To answer the provenance question, you might find derviedFrom to be useful for associating materials to the Observation (Variant). It you want to attach to the Report itself, I've seen other groups use the RelatedArtifact extension to do so. However, that being said, this is something we've been wrestling with too and could use input from your use case. Do you have examples of the reports you're representing?

Another group - eMERGE - has excellent documentation on their mapping to the CG FHIR IG: https://emerge-fhir-spec.readthedocs.io/en/latest/design.html#emerge-report-to-fhir-gr-ig-mapping-and-analysis. This also includes the shortcomings in the CG IG which we're working to incorporate (either into the profiles or as extensions). I think they're one of the groups using RelatedArtifact (as mentioned above).

liberaliscomputing commented 3 years ago

On top of the suggestions above, I found Sync 4 Genes, a relatively new initiative working on interoperable genomics standards and infrastructure development in FHIR. They are beginning Phase 4, which will be the last stage actually proposing a roadmap to infrastructure development. During Phases 1 and 2, they pilot-studied a list of different use cases in genomics using FHIR (https://www.healthit.gov/topic/sync-genes).

Here are the final reports from Phases 1 and 2:

They also provide sample files from the HL7 FHIR Connectathon:

NMDP: https://wiki.hl7.org/index.php?title=File:NMDP_Original_and_Final_Files.zip
Utah Newborn Screening Program: https://wiki.hl7.org/index.php?title=File:Final_S4GP2_Connectathon_Utah_NBS.zip
Weill Cornell Medical Center: https://wiki.hl7.org/index.php?title=File:Completed_Cornell_CaT_Files.zip

NIH-NCPI / ncpi-model-forge

✨Determine approach for representing genetic variants to fhir #50

Problem to be resolved

Ideal solution

Variant profile (Observation)

InheritedDiseasePathogenicity (Observation)

GenomicsReport (DiagnosticReport)

Current Working Solution