broadinstitute / seqr

web-based analysis tool for rare disease genomics
GNU Affero General Public License v3.0
176 stars 88 forks source link

Map HPO qualifiers to HPO IDs in seqr data structure #3171

Open jxchong opened 1 year ago

jxchong commented 1 year ago

Seqr's data structure maps HPO terms and stores the corresponding HPO IDs, however it seems to only store the qualifiers as plain text labels, without corresponding HPO IDs.

example, the data structure currently stores:

{"id": "HP:0000365",
                         "qualifiers": [{"type": "age_of_onset", "label": "Juvenile onset"},
                                        {"type": "pace_of_progression", "label": "Rapidly progressive"},
                                        {"type": "severity", "label": "Profound"},
                                        {"type": "temporal_pattern", "label": "Chronic"},
                                        {"type": "spatial_pattern", "label": "Generalized"},
                                        {"type": "laterality", "label": "Left"}
                                       ]
                        },

This is inconsistent with handling of the regular/primary/main HPO terms because the primary HPO term is stored as an ID but the qualifiers are stored only as plaintext. Ideally, the qualifiers would also be stored as IDs as IDs are unambiguous, such as below:

{"id": "HP:0000365",
                         "qualifiers": [{"type": "age_of_onset", "label": "Juvenile onset", "id": "HP:0003621" },
                                        {"type": "pace_of_progression", "label": "Rapidly progressive", "id": "HP:0003678" },
                                        {"type": "severity", "label": "Profound", "id": "HP:0012829"},
                                        {"type": "temporal_pattern", "label": "Chronic",  "id": "xxxx"},
                                        {"type": "spatial_pattern", "label": "Generalized",  "id": "xxxx"},
                                        {"type": "laterality", "label": "Left",  "id": "xxxxxx"}
                                       ]
                        },

Describe alternatives you've considered The GREGoR export seems to do some sort of mapping of the qualifier labels to IDs, which means that if you do not go through the GREGoR export function, the qualifiers are left as plaintext.

jxchong commented 1 year ago

Hi @hanars sorry to bug you but we need to know if this will be fixed, or if there's a reason for this behavior, or if this is something we need to manually code around. Thanks!

hanars commented 1 year ago

Hi Jessica, this is low priority for us as we are already reporting the data as needed by mapping to HPO IDs at the time of report generation. A full database migration and corresponding redesign of the front end to have no practical difference on the user experience or the generated report is unfortunately something our team does not have the bandwidth to work on anytime soon.

jxchong commented 1 year ago

ok thanks. Is there a specific reason it's designed this way? (so we know whether to consider developing a fix ourselves)

hanars commented 1 year ago

Its designed this way because the data model predates these reports by years, and reports are often subject to change (or even have conflicting representations in different reports) so when adding reports we generally try to format our existing data for the report instead of migrating our data to meet a new format. If you feel strongly and want to change the underlying data structure for yourselves you should feel free, although note that the UI for editing features will need some updates