GenEpiO / genepio

The Genomic Epidemiology Application Ontology describes the genomics, laboratory, clinical and epidemiological contextual information required to support data sharing and integration for foodborne infectious disease surveillance and outbreak investigations.
Other
19 stars 10 forks source link

Design of GenEpiO interface to 3rd party databases like NCBI BioSample, for comment #4

Closed Public-Health-Bioinformatics closed 5 years ago

Public-Health-Bioinformatics commented 8 years ago

To facilitate data exchange between the ideally generic epidemiological, laboratory and other biomedical concepts that GenEpiO contains, and the specific entities that projects like NCBI BioSample use to import or export data. For GenEpiO to operate as a clearinghouse of field and entity specifications, it needs to record the specifications of 3rd party databases on a field-to-field basis.

Our current approach now involves documenting 3rd party field information using the hasDbXref database cross reference relation which is commonly used to point to specific identifiers in 3rd party databases, e.g. "'Coregonus reighardi' hasDbXref ITIS:161947". This can also be used to point to metadata - e.g. field names. To organize this explicitly we have added a "centrally registered identifier symbol" -> "source database", which is a place to list all the database acronyms used as prefixes in the hasDbXref. Details of common taxonomic and geographic cross-reference databases are now listed here, for example Geonames, SNOMEDCT, ITIS, FAO ASFIS, etc.

screen shot 2016-09-14 at 10 16 36 am

Cross-references to other OBOFoundry ontology terms would likely be handled more directly via relations or equivalencies. There are a number of more complex issues to address (date format conversion and age to bucketed age range fields for example) which will require annotations on the hasDbXref annotations themselves.

ddooley commented 5 years ago

The source database list remains useful - and instances of each class can document versions of the given database. However, we now use a "field specification" class to document details of each field in a given 3rd party table, such as the example PulseNet specification below.

image

The hasDbXref relation is no longer used to point to these field specs, but instead, specification fields reference the field instances:

image