dcppc / metadata-matrix

Mapping of metadata elements that are common to the TopMed, GTEx and MODs.
1 stars 0 forks source link

Metadata_matrix

Extraction of MonDO-DO and Uberon-FMA mappings from MONDO and uberon owl files and generating JSON files.

1. Extracting MonDO - DO and Uberon - FMA mappings

Many MonDO (Monarch Disease Ontology) classes have database_cross_reference annotations to Disease Ontology (DO) classes in the mondo.owl file. Similarly, Uberon classes have database_cross_reference annotations to FMA classes in uberon.owl file. These cross references represent instances in which the classes have the same semantic meaning. SPARQL queries can extract the DOID or FMA cross reference annotations and provide a spreadsheet output with the labels of each mapped class. This is done by merging DO with MonDO (or Uberon with FMA), finding the database_cross_reference IDs, and then finding the DO (or FMA) class that matches that ID. These queries can be run through any triple store, but we recommend installing ROBOT to easily merge and then query the ontologies.

PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?id ?label ?doid ?dolabel WHERE {
    ?s oboInOwl:hasDbXref ?ref .
    ?s rdfs:label ?label .
    OPTIONAL { ?s oboInOwl:id ?id }
    FILTER regex(?ref, "DOID:[0-9]{1,7}")
    ?do oboInOwl:id ?doid .
    FILTER (str(?ref) = str(?doid))
    ?do rdfs:label ?dolabel .
}

SELECT ?id ?label ?fmaid ?fmalabel WHERE { ?s oboInOwl:hasDbXref ?ref . ?s rdfs:label ?label . OPTIONAL { ?s oboInOwl:id ?id } FILTER regex(?ref, "FMA:[0-9]{1,7}") ?fma oboInOwl:id ?fmaid . FILTER (str(?ref) = str(?fmaid)) ?fma rdfs:label ?fmalabel . }

* After saving the SPARQL queries, run the following ROBOT command at the terminal in the directory that you saved the queries to. This first command will first merge MonDO and DO, and then execute the SPARQL query on the merged ontology to generate the mappings. Be aware that this query could take some time, as both MonDO and DO are very large ontologies and need to be downloaded from the input IRIs.

robot merge --input-iri http://purl.obolibrary.org/obo/mondo.owl --input-iri http://purl.obolibrary.org/obo/doid.owl query --format csv --query mondo-do.rq mondo_doid.csv

A pre-generated MonDO-DO mapping file can be found here:
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/mondo_doid.csv

* The following ROBOT command will merge Uberon and FMA owl files to execute the SPARQL query to output the mappings into a csv (comma separated values) file.

robot merge --input-iri http://purl.obolibrary.org/obo/uberon.owl --input-iri http://purl.obolibrary.org/obo/fma.owl query --format csv --query uberon-fma.rq uberon_fma.csv



The uberon-fma.csv can be found here:
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/uberon_fma.csv 

_**2. Generating JSON mapping files**_

The CSV files generated from STEP 1 are used as input to generate JSON mapping files.

The python code for generating mondo-do json mapping file using mondo_doid.csv is here,
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/make_mapping_mondo_do.py

The python code for generating uberon-fma json mapping file using uberon_fma.csv is here,
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/make_mapping_uberon_fma.py

The mondo-do json mapping file is here,
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/mondo_do_mapping.json

The uberon-fma json mapping file is here,
https://github.com/dcppc-phosphorous/Metadata_matrix/blob/master/uberon_fma_mapping.json

### Extraction of HPO - MP mappings and generating JSON file 
The Human Phenotype Ontology (HPO) classes and their equivalent Mammalian Phenotype Ontology (MP) mapping classes are extracted from the resource file found here:

https://github.com/obophenotype/upheno/blob/master/mappings/hp-to-mp-bestmatches.tsv

These mappings will be updated once a new HPO-MP mapping bridge file is made available.

The python script for generating HPO-MP JSON mapping file using the .tsv file can be found here:

https://github.com/dcppc/metadata-matrix/blob/master/make_mapping_hp_mp.py

The HPO-MP mapping JSON file generated by the python script can be found here:

https://github.com/dcppc/metadata-matrix/blob/master/hp_mp_mapping.json

### Extraction of Human-MOD gene orthology mappings and generating JSON file
The following resource is used to generate the orthology mapping JSON file.
AGR filtered orthology .tsv file:
https://reports.alliancegenome.org/alliance-orthology-july-19-2018-stable-1.6.0-v4.tsv

This file is generated by Alliance with a parameter that determines a MOD gene an ortholog of human gene set to “stringent”. The file consists of human genes in column1 and their MOD orthologs in column5. The python script for generating human genes and their MOD ortholog genes JSON mapping file using the .tsv file mentioned above as input can be found here:

https://github.com/dcppc/metadata-matrix/blob/master/make_mapping_human_mods_orthology.py

The human - MOD ortholog gene JSON mapping file generated by the python script can be found here:
https://github.com/dcppc/metadata-matrix/blob/master/human_mods_orthology.json