Closed Public-Health-Bioinformatics closed 5 years ago
The source database list remains useful - and instances of each class can document versions of the given database. However, we now use a "field specification" class to document details of each field in a given 3rd party table, such as the example PulseNet specification below.
The hasDbXref relation is no longer used to point to these field specs, but instead, specification fields reference the field instances:
To facilitate data exchange between the ideally generic epidemiological, laboratory and other biomedical concepts that GenEpiO contains, and the specific entities that projects like NCBI BioSample use to import or export data. For GenEpiO to operate as a clearinghouse of field and entity specifications, it needs to record the specifications of 3rd party databases on a field-to-field basis.
Our current approach now involves documenting 3rd party field information using the hasDbXref database cross reference relation which is commonly used to point to specific identifiers in 3rd party databases, e.g. "'Coregonus reighardi' hasDbXref ITIS:161947". This can also be used to point to metadata - e.g. field names. To organize this explicitly we have added a "centrally registered identifier symbol" -> "source database", which is a place to list all the database acronyms used as prefixes in the hasDbXref. Details of common taxonomic and geographic cross-reference databases are now listed here, for example Geonames, SNOMEDCT, ITIS, FAO ASFIS, etc.
Cross-references to other OBOFoundry ontology terms would likely be handled more directly via relations or equivalencies. There are a number of more complex issues to address (date format conversion and age to bucketed age range fields for example) which will require annotations on the hasDbXref annotations themselves.