Closed colleenXu closed 1 year ago
Issues we'll want to deal with at some point:
pubmed
response-mapping key). I think this is tricky to work with because of the special api-response-transform that happens to it. It's not clear to me if the transformed publications field holds only PMIDs or if it can sometimes have non-PMID publications the way the raw API responses do...Known issue, set-aside and out-of-scope for now: We won't fully follow the spec because of some limitations with the current x-bte annotation (this may improve with JQ-related processing):
Finally: the spec says a second biolink:publications
edge-attribute can be made when the reference info is a free-text string. I don't think we need to do that in this issue, because I didn't notice any strong examples of these in the SmartAPI yamls...
sometimes a field's value will already have a prefix (it may or may not be formatted exactly the way it should be for biolink-model) and sometimes it won't. So sometimes we'll be adding a prefix, sometimes replacing it, and sometimes we may not need to do anything (it's already formatted the way we want)
My current plan is to check if the prefix [with ":" so like "PMID:"] is there (in any casing), and if so, strip the prefix. Then just add the prefix. Is there any other cases that should be handled?
I didn't modify biolink / monarch API but I'll need to (it still uses the pubmed response-mapping key). I think this is tricky to work with because of the special api-response-transform that happens to it. It's not clear to me if the transformed publications field holds only PMIDs or if it can sometimes have non-PMID publications the way the raw API responses do
It seems like the current code is attempting to filter out only PMID IDs. But if they are using the same/known prefixes for the other ID types, then we have two options
PharmGKB (not added to config list yet): ref_url: data.literature._sameAs seems to be the best way to get 1 value per unique reference. However, sometimes this field's value is an expanded url for a PMID/PMCID (clinical guidelines, variant annotation). There shouldn't be any issues with "reporting the same publication more than once"
If we wanted to use the CURIE's when possible, we could parse the URL looking for the URLs that identify PMID/PMCID (ie. http://www.ncbi.nlm.nih.gov/pmc/, http://www.ncbi.nlm.nih.gov/pubmed), and then translate it to a PMID/PMCID/etc
Remember, you can search the PR / branch for SmartAPI yamls that contain the keywords you're testing
@rjawesome
Converting urls to CURIEs may also not be possible in all cases:
10.1158/1538-7445.AM2013-DDT02-01
vs url is http://cancerres.aacrjournals.org/content/73/8_Supplement/DDT02-01
).https://clinicaltrials.gov/search?id=%22NCT00485888%22
and ID is NCT00485888
. The problem is the trailing %22
http
vs https
or having the www.
vs notSee PRs (description of behavior on api-response-transform PR)
The corresponding SmartAPI updates have been done, and the registrations have been refreshed. https://github.com/NCATS-Tangerine/translator-api-registry/pull/128 This means all instances with this code deployed (dev/ci/test) should begin working with this feature within minutes (after they pull the latest registry info).
This update was need for this code to work properly. The code isn't back-compatible, so the old behavior (using the pubmed keyword in response-mapping) wasn't working on the instances that had a deployment with this code.
EDIT: until the code from this issue is deployed on Prod, Prod will have wonkiness with how it handles publication info - since it doesn't have the code to process the new response-mapping keywords. Jackson has already made a post in Translator Slack (general channel) informing the consortium of this.
And info from Aug 9-10th from UI team (Translator slack links):
@colleenXu can this be closed as completed?
Yep let's close this as complete since it's been deployed.
The limitations are:
sources
part of the TRAPI edge, rather than putting it into the publications edge-attribute. We haven't implemented this at all.
The Translator UI is supposed to be able to handle more kinds of "references" (publications) for an edge - not just the PMIDs we provide in the
biolink:publications
edge-attribute right now. In Translator Slack comms, the UI team has confirmed that they plan to support the specification here.For now, we don't have to worry about "free-text description"-style references (we don't really have any of these).
And I'll explain the spec below...
Implementation
We'd like to adjust / expand our behavior to match this spec and provide more reference info to users....by taking the values from sometimes multiple fields, replacing/appending proper prefixes, and putting them into 1 edge-attribute.
Here's what's involved:
biolink:publications
'svalue
, and how they should be processed:ref_pmid
(previouslypubmed
): we want the output-strings to have the prefixPMID
ref_url
(previouslybiolink:source_web_page
): no processing needed. The strings are urlsref_pmcid
: we want the output-strings to have the prefixPMCID
(however, I made a biolink-model issue because which prefix to use was confusing https://github.com/biolink/biolink-model/issues/1366)ref_clinicaltrials
: we want the output-strings to have the prefixclinicaltrials
. However, the spec said putting this data in this edge-attribute was temporary / in-flux...ref_doi
: we want the output-strings to have the prefixdoi
(biolink-model spelling ref)ref_isbn
: we want the output-strings to have the prefixisbn
(biolink-model spelling ref)SmartAPI overrides
Note: [PharmGKB](https://github.com/NCATS-Tangerine/translator-api-registry/blob/publication-keywords/pharmgkb/smartapi.yaml) is excluded here because it isn't added to the config file yet. it can be added here if you want, but the stuff listed here should be plenty to test the 6 response-mapping keys above... ``` { "conf": { "only_overrides": false }, "apis": { "0212611d1c670f9107baf00b77f0889a": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/CTD/smartapi.yaml", "1f47552dabd67351d4c625adb0a10d00": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/EBIgene2phenotype/smartapi.yaml", "77ed27f111262d0289ed4f4071faa619": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/MGIgene2phenotype/smartapi.yaml", "38e9e5169a72aee3659c9ddba956790d": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/bindingdb/smartapi.yaml", "e3edd325c76f2992a111b43a907a4870": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/dgidb/openapi.yml", "316eab811fd9ef1097df98bcaa9f7361": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/gtrx/gtrx.yaml", "dca415f2d792976af9d642b7e73f7a41": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/litvar/smartapi.yaml", "8f08d1446e0bb9c2b323713ce83e2bd3": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/mychem.info/openapi_full.yml", "671b45c0301c8624abbd26ae78449ca2": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/mydisease.info/smartapi.yaml", "59dce17363dce279d389100834e43648": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/mygene.info/openapi_full.yml", "09c8782d9f4027712e65b95424adba79": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/myvariant.info/openapi_full.yml", "b772ebfbfa536bba37764d7fddb11d6f": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/ncats_rare_source/smartapi.yaml", "edeb26858bd27d0322af93e7a9e08761": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/pfocr/smartapi.yaml", "03283cc2b21c077be6794e1704b1d230": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/rhea/smartapi.yaml", "1d288b3a3caf75d541ffaae3aab386c8": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/semmeddb/smartapi.yaml", "d22b657426375a5295e7da8a303b9893": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/publication-keywords/biolink/openapi.yml" } } ```biolink:publications
edge-attribute (many-to-1). It should have this format:Potentially-helpful implementation notes:
null
or an empty string (ignore, don't add to output?)value
array, after the array has been assembled (and after records are merged into edges...)