The header line of the clingen file includes the following columns:
GENE SYMBOL
GENE ID (HGNC)
DISEASE LABEL
DISEASE ID (MONDO)
MOI
SOP
CLASSIFICATION
ONLINE REPORT
CLASSIFICATION DATE
GCEP
The three columns in italics above would also be useful to add -- MOI, CLASSIFICATION DATE, AND GCEP. The modified document should look something like this:
In https://github.com/biothings/mygene.info/blob/master/src/hub/dataload/sources/clingen/parser.py#L65 of the current clingen parser, we specify five columns to parse out of the downloaded clingen file
key_list = ['DISEASE LABEL', 'DISEASE ID (MONDO)', 'SOP', 'CLASSIFICATION', 'ONLINE REPORT']
The three columns in italics above would also be useful to add -- MOI, CLASSIFICATION DATE, AND GCEP. The modified document should look something like this: