NIH-NCPI / ncpi-model-forge

🔥 The Project Forge FHIR model
Apache License 2.0
4 stars 5 forks source link

NCPI Family Relationship #21

Closed torstees closed 3 years ago

torstees commented 3 years ago

Requester information

Please provide the following information: Name: Eric Torstenson Affiliation: VUMC (Vanderbilt)

Request Details

Please provide the following information about what you wanting to accomplish with your model change request: Purpose: Our data will have maternal/paternal and twin IDs along with extended free text that will need to be captured in the data model.

Who it benefits: Any groups that work with Pedigree data

Use case:

RobertJCarroll commented 3 years ago

Hi all! I took at look at the solution in place for the KF model, which uses the relation extension. After our conversation about "citations" and linking resources, I'm not sure that extension makes sense given the mutability/extensibility of study information for families. I don't think there are any good Patient-Patient links out there that would allow us to specify the relationship, right? Is this something that merits a new resource, perhaps pushing this back up to HL7?

allisonheath commented 3 years ago

We've discussed this a few times and I think came to similar conclusions about there not being a natural fit for it with the current HL7 resources. That being said, we need to store the information somehow, so hence the current interim solution of the relation extension. See the gist here with an example of a participant and their mother and father, note that right now we expect all relationships to go one way (child->parent in this case), in our previous model we actually kept both directions. We had a short discussion about a few other thoughts of how to represent this in this ticket.

And should also note, our use case is pretty similar. Most of our data is trio-based design, so have genomic sequencing from the mother, father, and child. But there can be QC issues or missing data, so it's not always complete trios. There's a few cases of extended families, where we sometimes have genomic data and sometimes only clinical information. I've attached a CSV of a raw grouping of the relationships we get (please ignore the Mother/Father capitalization issue - it gets fixed downstream atm). Via the portal ETL process we roll up this information into proband-only, duo, duo+, trio, trio+ which is what's actually used for searching purposes. The + mean additional family members. But then yes, we've had a lot of requests for ped files or similar to use as input for bioinformatics pipelines, but haven't had a good solution yet. This includes our own for our genomic pipelines - which we custom generate ped files directly from these database fields, which utilize a family identifier (FM_*) and biospecimen (BS_*) identifiers like so:

FM_XB3ZBW2G BS_ZW3N6S88 0   0   1   1
FM_XB3ZBW2G BS_K4S80TJW BS_ZW3N6S88 BS_W0VZPDQ5 2   2
FM_XB3ZBW2G BS_W0VZPDQ5 0   0   2   1

So why we're also using Group resource as a way to have a identifier/handle on the whole family as well.

One representation principle we did determine to be useful is that if we have a specimen for someone, or they've been surveyed directly, they should be represented as a Patient (versus just as part of someone else's medical history).

cc @liberaliscomputing

RobertJCarroll commented 3 years ago

Thank you for the extra background. I actually really like the idea of using Observation for modeling these relationships. It would allow us to model the twin status as well explicitly.

It does make me wonder another basic question: Is there value to profiling Observation for a family relationship observation? Seems like it could be helpful in guiding use, but I don't know if it causes problems broader re interop. EG, if someone were to come in with an Observation that has the right code it could still work. Interesting re asking for "give me all Observation with this code " in all cases versus people wanting to "give me all Observation->Family Relationship" or whatnot.

liberaliscomputing commented 3 years ago

@RobertJCarroll, I may not be enough capable of arguing over using Observation for representing a family relationship, but I want to leave my thoughts as we, the D3b KF Model FHIR team, had some discussion around it.

We developed relation as an extension to Patient and recently figured that remodeling it as a separate profile (not as an attribute extension) might be a more correct approach given:

  1. FHIR is very Observation-oriented; and
  2. The documentation's scope says "Observations are a central element in healthcare, (...) and even capture demographic characteristics. "

GH4GH's Phenopackets also took a similar approach with our initial modeling:

Against this backdrop, we may remodel a family relationship as an extended profile off of Observation as follows:

{
  "resourceType": "Observation",
  "meta": {
    "profile": [
      "http://fhir.ncpi-project-forge.io/StructureDefinition/ncpi-family-relationship"
    ]
  },
  "code": {
    "coding": [
      {
        "system": "http://snomed.info/sct",
        "code": "35359004",
        "display": "Family"
      }
    ]
  },
  "subject": {
    "reference": "Patient/pt-001" # Assume this is a mother
  },
  "focus": [
    {
      "reference": "Patient/pt-002" # Assume this is a child
    }
  ],
  "valueCodeableConcept": {
    "coding": [
      {
          "code": "PRN",
          "display": "parent",
          "system": "http://terminology.hl7.org/CodeSystem/v3-RoleCode"
      },
      {
          "code": "MTH",
          "display": "mother",
          "system": "http://terminology.hl7.org/CodeSystem/v3-RoleCode"
      }
    ],
    "text": "Mother"
  }
}

In terms of querying family relationships in a server, we would be able to search by code (if a search parameter is properly implemented) or profile as follows:

RobertJCarroll commented 3 years ago

Thanks for the detailed response! I think this makes sense.

It is somewhat unfortunate that FamilyMemberHistory is organized the way it is. I agree it doesn't align with what we are looking to do here, and I wonder if it really solves the Pedigree problem that well (though it's definitely fewer resources vs stubbing out patients/relationships/affected status).

Presuming we go with an Observation (or profiled Obs), what would we require to model? From the discussion, it doesn't seem necessary to be complete and bi-directional. Perhaps the child->parent relationships must be modeled in the one direction (as in the resource above) and then twins/trips/etc must have all links included both directions?

fiendish commented 3 years ago

what would we require to model?

I think there might be two questions in there:

  1. which relationship role codes do we choose to use?
  2. when do we need to add them?

It might e.g. make more sense to code as "parent" instead of "mother"/"father" since one assumes that mother/father status may be more precisely described via "parent"+<parent sex>. This does not though extend to non-immediate relations, where you would probably want to record the lineage sequence (e.g. maternal grandmother). But would you then need to mandate creating stub people for linking up those relations?

From the discussion, it doesn't seem necessary to be complete and bi-directional.

I think so, but we do have to remember when searching to then look for both sides of the relationships. And if you have more than two people genetically related to each other (e.g. child, mother, mother's mother), only connecting the genetic lineage by minimum spanning tree (eschewing a direct relationship between child and mother's mother) makes mapping the relationships a recursive process instead of one-shot. These can both be somewhat mitigated by putting all family members into "groups" (I believe we do this) and then requesting relationships by family group rather than by the patients directly, though depending on what you're looking for in the data you may still want to then locally walk immediate relationships to reconnect distant ones.

On the premise that inserting new data happens less frequently than requesting existing data, I think my preferred mechanism is to actually record as many directions and connections as possible. One could, and some servers do support this, only require submitting one direction and then have the server fill in the reverse or dense relationships via a post-insert hook.

Perhaps the child->parent relationships must be modeled in the one direction (as in the resource above) and then twins/trips/etc must have all links included both directions?

I don't understand why twin-twin would need to be bidirectional if child-parent doesn't. Can you explain that?

liberaliscomputing commented 3 years ago

Perhaps the child->parent relationships must be modeled in the one direction (as in the resource above) and then twins/trips/etc must have all links included both directions?

Maybe we would want it to be proband-centric rather than relying on specific types of relationships? Say, given triplets A, B, and C, let's assume only A and C are probands. Then, create A >> B, A >> C, C >> A, and C >> B, not B >> A and B >> C? In general, I agree with Avi's idea of preferring to create as many directions and connections as possible unless we add a graph traversal layer.

fiendish commented 3 years ago

If the goal of the model is to always work consistently, I think that any explored example scenarios should involve complex families with multiple generations, at least first cousins, and multiple affected persons.

Maybe we would want it to be proband-centric rather than relying on specific types of relationships? Say, given triplets

IMO the concept of proband is useful during diagnostic contact-tracing but is ultimately harmful to genomic research data collection because people often start accidentally believing (and building rigid systems around) things that categorically aren't true, like that an enrolled family group can only have one. There are affected people and non-affected people and they have genetic overlap with each other. Which one you see first or which one you designate as your "point of entry" has at best sociological relevance but not biological relevance.

Say, given triplets

I'm glad that your triplet example includes two probands! But I likewise worry about using the concept of trios/triplets, to a lesser degree, for a reason that overlaps with the above. We periodically have studies enroll multiple three-person-groups who are then discovered to be related to each other. Common discourse around trios/triplets leads people to design systems that don't accommodate those relations which leads to sometimes relevant genetic connections being thrown away.

RobertJCarroll commented 3 years ago

Thanks for all of this commentary!

At a high level, we need to address the complexity of representing these data and there are a few areas that have big impacts, eg whether to explicitly state a limited set of relationships or all relationships. Does that model traversal happen before submission, during integration, or on retrieval? All depends on our requirements.

I don't understand why twin-twin would need to be bidirectional if child-parent doesn't. Can you explain that?

Short version is that "Parent of" is a different relationship than "Child of", while "Twin of" is only one relationship. While the data would be modeled with only one "twin of", I'm suggesting we say all of each class of relationship must be modeled, ie all "child of" and "twin of".

fiendish commented 3 years ago

Short version is that "Parent of" is a different relationship than "Child of", while "Twin of" is only one relationship.

I see. i.e. the relationship is really a bidirectional "are twins" rather than unidirectional "is twin of" because of the symmetry, yes? That makes sense to me with the plural/bidirectional wording.

tlicht3 commented 3 years ago

Based on my question during the call today, I wanted to provide a comment here - I think this ticket is describing data (genomic or other) for one or more family members who have contributed data to the study, even though they are not the proband/primary patient. I asked today how this is related, if at all, to information that is gathered about the patient/proband's family history (e.g. proband's great-grandfather had prostate cancer). Someone noted that the FamilyMemberHistory FHIR resource would be used for general information about the patient/proband's family history, which seems reasonable. I'm curious if these data are linked in any way or if there are any potential issues having such similar data in different spaces within the model.

For example, if a patient knows that a grandfather and uncle had a specific disease and the patient's father participated in a trio study that also gathered information about the father's history of that same disease, I assume the father would have an observation (as described in this ticket), but in order for a user of the data to see this as part of the family history, would the father also need an entry using the FamilyMemberHistory, or is there another way to link these data?

Today we talked about having example data for both of these scenarios - the GDC, as well as several of the Gen3 data commons have data about family history (https://github.com/NCI-GDC/gdcdictionary/blob/master/gdcdictionary/schemas/family_history.yaml). If there is anything I can do to help with this (e.g. providing specific use cases), please let me know.