NCATS-Tangerine / beacon-aggregator

A web service that operates over the Beacon network to provide a single software interface over the all the Beacons
Other
2 stars 0 forks source link

Expand semantic typing of concepts #15

Open mbrush opened 7 years ago

mbrush commented 7 years ago

Add more categories to improve filtering capabilities. Make these more precise and consistent across sources (e.g. concepts from WD and Monarch and other sources should all use same semantic types).

Related to https://github.com/NCATS-Tangerine/translator-knowledge-beacon/issues/15

mbrush commented 7 years ago

Need to harmonize/map between metamap and monarch categories.

Metamap semantic groups: see https://metamap.nlm.nih.gov/Docs/SemGroups_2013.txt

Monarch SciGraph/neo4j semantic categories (from here):

http://purl.obolibrary.org/obo/CL_0000000 : cell
http://purl.obolibrary.org/obo/UBERON_0001062 : anatomical entity
#http://purl.obolibrary.org/obo/UBERON_0000468 : multi-cellular organism
http://purl.obolibrary.org/obo/PATO_0000001 : quality
#http://purl.obolibrary.org/obo/GO_0005623 : cell
http://purl.obolibrary.org/obo/NCBITaxon_131567 : organism
http://purl.obolibrary.org/obo/CLO_0000031 : cell line
http://purl.obolibrary.org/obo/DOID_4 : disease
#http://purl.obolibrary.org/obo/PATO_0000003 : assay
#http://purl.obolibrary.org/obo/PATO_0000006 : process
#http://purl.obolibrary.org/obo/PATO_0000011 : age
#http://purl.obolibrary.org/obo/CHEBI_23367 : molecular entity
http://purl.obolibrary.org/obo/CHEBI_23888 : drug
http://purl.obolibrary.org/obo/UPHENO_0001001 : Phenotype
http://purl.obolibrary.org/obo/GO_0008150 : biological process
http://purl.obolibrary.org/obo/GO_0009987 : cellular process
http://purl.obolibrary.org/obo/GO_0005575 : cellular component
http://purl.obolibrary.org/obo/GO_0003674 : molecular function
http://purl.obolibrary.org/obo/SO_0000704 : gene
http://purl.obolibrary.org/obo/GENO_0000536 : genotype
http://purl.obolibrary.org/obo/GENO_0000504 : reagent targeted gene
#http://purl.obolibrary.org/obo/GENO_0000000 : intrinsic genotype
#http://purl.obolibrary.org/obo/GENO_0000524 : extrinsic genotype
#http://purl.obolibrary.org/obo/GENO_0000525 : effective genotype
http://purl.obolibrary.org/obo/GENO_0000002 : variant locus
http://purl.obolibrary.org/obo/SO_0001059 : sequence alteration
http://purl.obolibrary.org/obo/SO_0000110 : sequence feature
http://purl.obolibrary.org/obo/ECO_0000000 : evidence
http://purl.obolibrary.org/obo/PW_0000001 : pathway
http://purl.obolibrary.org/obo/IAO_0000310 : publication
http://xmlns.com/foaf/0.1/Person : case
http://purl.org/oban/association : association
# the following can be removed when VT/OBA is linked to UPheno
http://purl.obolibrary.org/obo/VT_0000001 : Phenotype
http://purl.obolibrary.org/obo/OBA_0000001 : Phenotype
http://purl.obolibrary.org/obo/SO_0001483 : snv
http://purl.obolibrary.org/obo/GENO_0000871 : haplotype
http://purl.obolibrary.org/obo/SO_0000340 : chromosome
http://purl.obolibrary.org/obo/SO_0000104 : protein
RichardBruskiewich commented 7 years ago

Matt, as it happens, the underlying semantic encoding of data is currently hard coded to UMLS Semantic Groups, which have been generally acknowledged as inadequate for the full scope of data types we encounter. I guess we'll need a community discussion (e.g. at the hackathon?) to consider specifying an common enhanced "Translator" semantic group / data type ontology. I may take time on the plane thinking about this.

Concurrently, I need to comment that when you posted this issue, there were some key deficiencies in the Knowledge Beacon API with respect to Semantic Group encoding, but I have now made great progress in fixing them and am applying the patches to the beacon wrappers encoded at my end (and hopefully, the WikiData wrapper will also be updated by Greg Stuppie too).

The upside of this is that filtering and display of semantic types in the TKBio web client (and beacon aggregator filtering of those, obviously) will work much better now.

mbrush commented 6 years ago

The list of semantic/entity types that we end up using here should be used across all translator efforts, to improve interoperability. Note we are developing a similar list as part of efforts to catalog Translator knowledge source - see https://docs.google.com/document/d/1SCbNFu29wWVBO2OsR0qbWl9KXusV3fWmgrnQFyglIFc/edit#

Be sure to sync up here.