NCATSTranslator / ReasonerAPI

NCATS Biomedical Translator Reasoners Standard API
35 stars 28 forks source link

Representing Supporting Publications in TRAPI #409

Closed mbrush closed 1 year ago

mbrush commented 1 year ago

Biolink provides edge properties and classes that can be used to describe supporting publications for Edges in TRAPI messages. The document here outlines a proposed specification for how these components should be used in TRAPI messages to provide a standard and computable representation of supporting publication information.

The key rules of the specification are listed below. The Google document here here provides additional details, examples, and implementation guidance.

  1. The ‘publications’ edge property will be used to capture any published documents that report the statement expressed in an Edge, or provide information used as evidence supporting this statement. These will most often be scientific journal articles, but may include books, patents, pre-prints, product inserts, web pages, etc.

  2. Publications MUST be referenced using identifiers in CURIE form when available (e.g "PMID:1593752", "doi:10.1177/00928615010300134").

  3. KPs SHOULD attempt to create CURIE form representations of URLs whenever possible, by re-writing the URL using an established prefix, or creating and registering a new prefix in the Biolink model prefix map (e.g. ‘wikipedia:imatinib’ for a Wikipedia page URL). For guidance, see below (TO DO)

  4. When a PMID, DOI, or other CURIE-form identifier is not available or possible to create from the URL of a supporting publication (e.g. because the URL pattern does not lend itself to the CURIE prefix paradigm), it SHOULD be referenced by a single valid URL. If neither a CURIE or URL is available, the publication MAY be referenced by an informal title or description.

  5. Where multiple types of publication identifiers exist for a single Publication (e.g. a PMID, PMCID, and DOI for the same journal article): a. KPs MUST provide only one identifier per publication. b. PMIDs SHOULD be used when available.

  6. Where multiple publications support a single Edge, these MUST be reported in a single Attribute object as a list of comma separated CURIEs and/or URLs in the Attribute.value field, e.g.:

    "value": ["PMID:31737390", "PMID:6815562", "wikipedia:Imatinib", "dailymed:1debf3a0-7587-47d7-8ea6-e739698d7297", "http://info.gov.hk/gia/general/201011/02/P201011020204.htm"]        
  7. For reporting metadata about a publication (e.g. title, journal, abstract, dates), KPs SHOULD rely on the Text Mining Knowledge Provider’s Publication Metadata API as needed. However, the Attribute description and value_url fields MAY be used to provide additional metadata in the TRAPI message itself as desired. a. Note that this Publication Metadata API only serves information about journal articles and other formal publications indexed in Pubmed. It will not be able to return metadata about other types of documents (e.g Wikipedia pages, DailyMed records, patents, pre-prints, etc.).


Please review and provide feedback here or in the linked google document.

edeutsch commented 1 year ago

This is excellent, many thanks. A few thoughts:

1) I'm thinking it would be useful to create a PR against branch 1.4 with a Markdown doucment with the above guidelines. It would then be processed like a regular PR

2) "single valid http URL". Is the "http" helpful here? Does that mean that https is not desired? I'm thinking that just "valid URL" is enough.

3) It seems that example "https://en.wikipedia.org/wiki/Imatinib" conflicts with the guidelines that CURIEs should be used, especially when "wiki:imatinib" is explicitly shown in the document as well.

mbrush commented 1 year ago

Thanks Eric - good points all. I updated the text above.

edeutsch commented 1 year ago

This is addressed here: https://github.com/NCATSTranslator/ReasonerAPI/blob/1.4/ImplementationGuidance/Specifications/supporting_publications_specification.md