ga4gh / pedigree

Repository for the family history/pedigree project
https://pedigree.readthedocs.io/
11 stars 3 forks source link

affected - not recommended #10

Closed julesjacobsen closed 3 years ago

julesjacobsen commented 3 years ago

In the 2021-01/model.md the affected field on an individual is marked as "not recommended", but nonetheless included for backwards compatibility.

Individual

Field Type Status Definition
affected boolean not recommended whether or not the individual is affected by the condition being investigated in this pedigree; included for PED backwards compatibility

Pedigree

Field Type Status Definition
proband ID optional id of Individual that is the index case for the family
consultand ID optional id of Individual that is the focus of the current analysis
date Date optional the date the pedigree was collected or last updated, as ISO full or partial date, i.e. YYYY, YYYY-MM, or YYYY-MM-DD
reason Concept optional the reason for pedigree collection, especially a health condition of focus being investigated in the family; if any Individual has the affected property defined, it refers to this condition

How are users supposed to indicate which individuals are affected with the Pedigree reason in the case of something like a rare-disease diganosis? It these cases you don't know the condition, but a specialist has determined that individuals A and B are both affected with a common condition. You can't reliably use a collection of phenotype terms either as these may be different making downstream software's job really hard to ascertain which individuals in the pedigree are affected. Hence the use of a boolean flag or status codes.

I agree that this is annoying in that it means that the pedigree is, like with the proband and consultand, a relative thing which limits reuse.

To remedy my first point, it might be simplest to make the field "optional". The Pedigree reason field is marked optional for this reason so it seems odd to not recommend it in the Individual.

To address the point of relative use of a pedigree, adding another data structure to hold the 'unbiased' pedigree along with identifiers indicating the proband, consultand, reason and affected. This would make the Pedigree a simple statement of relationships and the other thing would encode the data necessary for a particular use/analysis of the Pedigree

e.g.

PedigreeContext

Field Type Status Definition
proband ID optional id of Individual that is the index case for the family
consultand ID optional id of Individual that is the focus of the current analysis
date Date optional the date the pedigree was collected or last updated, as ISO full or partial date, i.e. YYYY, YYYY-MM, or YYYY-MM-DD
reason Concept optional the reason for pedigree collection, especially a health condition of focus being investigated in the family; if any Individual has the affected property defined, it refers to this condition
pedigree Pedigree required pedigree in which the individuals indicated in the proban, consultand, reason and affected fields are related

Pedigree

Field Type Status Definition
individuals repeated Indvidual required individuals belonging to this Pedigree
relations repeated Relationship required relationships between the individuals in this pedigree
buske commented 3 years ago

This was discussed during the last pedigree call and we agreed that affected should be optional rather than not recommended, which was a relic from the version in which there was an additional (preferred) method for annotating affected status.