biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

add additional clingen fields #107

Closed andrewsu closed 5 months ago

andrewsu commented 3 years ago

In https://github.com/biothings/mygene.info/blob/master/src/hub/dataload/sources/clingen/parser.py#L65 of the current clingen parser, we specify five columns to parse out of the downloaded clingen file

key_list = ['DISEASE LABEL', 'DISEASE ID (MONDO)', 'SOP', 'CLASSIFICATION', 'ONLINE REPORT']

The header line of the clingen file includes the following columns: GENE SYMBOL GENE ID (HGNC) DISEASE LABEL DISEASE ID (MONDO) MOI SOP CLASSIFICATION ONLINE REPORT CLASSIFICATION DATE GCEP

The three columns in italics above would also be useful to add -- MOI, CLASSIFICATION DATE, AND GCEP. The modified document should look something like this:

{
    "_id": "23676",
    "_score": 1.55,
    "clingen": {
        "_license": "https://www.clinicalgenome.org/docs/terms-of-use/",
        "clinical_validity": {
            "classification": "definitive",
            "classification_date": "2017-09-12T16:00:00.000Z",
            "disease_label": "nonsyndromic genetic deafness",
            "gcep": "Hearing Loss",
            "moi": "XL",
            "mondo": "MONDO:0019497",
            "online_report": "https://search.clinicalgenome.org/kb/gene-validity/29773bee-1f13-43f6-bda0-c5a646efccd7--2017-09-12T16:00:00",
            "sop": "SOP6"
        }
    },
    "name": "small muscle protein X-linked",
    "symbol": "SMPX"
}
colleenXu commented 5 months ago

Closing because it looks like this issue's been addressed and changes deployed https://mygene.info/v3/query?q=_exists_:clingen.clinical_validity.gcep&fields=clingen.clinical_validity