broadinstitute / genetic-prevalence-estimator

https://genie.broadinstitute.org/
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Improve error handling for unknown clinical significance terms #141

Open nawatts opened 1 year ago

nawatts commented 1 year ago

Currently, new clinical significance terms are when the import pipeline fails to map them to a category.

https://github.com/broadinstitute/genetic-prevalence-estimator/blob/121e5cd7cb18a2628fe3fdd0db9806ac6e648554/data-pipelines/import_clinvar.py#L195-L203

The pipeline will fail on the first unknown term it encounters, so if there are multiple new terms, then the pipeline has to be re-run multiple times, identifying one new term each time, until it succeeds. Preferably, if there are unknown terms, the pipeline would collect them all and include them in the error message. This would make it much easier to update the categories.