Open AlexanderPico opened 2 months ago
To avoid confusion, this is distinct from the analyses
returns from PFOCR-based enrichment., which were recently upgraded to include pfocrURL and have a nice structure. Basically, can we do the same upgrade for the edge-level returns?
@everaldorodrigo put this to your plate. Feel free to create a new issue at pending.api
repo and point to this one at the bte repo.
Wait....I'm seeing multiple confusing points. Maybe some clarification would be useful?
sources
section. We should be able to get this done later this week, after Translator Eel Prod deployment. pfocrUrl
or any updated structure. This result augmentation has nothing to do with x-bte annotation and PFOCR TRAPI-edge format - which were discussed recently (point 2). I think there's been some crossed-wires/confusion here...Thanks @colleenXu. I will be the first to admit confusion. This is still not totally clear to me, so I'll rephrase my ask from scratch based on what I see today and what I hope to see.
I see these two types of JSON snippets in TRAPI results containing PFOCR content, which I'm going to label Edge
and Analyses
to distinguish the two distinct parts of the TRAPI result. And I'll include Current
and Suggested
examples with a Summary
of the diff...
1. Analyses Current:
"pfocr": [
{"matchedCuries": [
"NCBIGene:8445",
"NCBIGene:1859",
"NCBIGene:2932",
"NCBIGene:2735"
],
"score": 0.2352941176470588,
"pmc": "PMC2743241",
"figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg"
},
Suggested:
"pfocr": [
{"matchedCuries": [
"NCBIGene:8445",
"NCBIGene:1859",
"NCBIGene:2932",
"NCBIGene:2735"
],
"score": 0.2352941176470588,
"pmc": "PMC2743241",
"figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg",
"pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
},
Summary: addition of link to PFOCR website called "pfocrUrl" or whatever you like. I thought this was what we've been discussing for past few months and maybe it's already done?
2. Edge Current:
"predicate": "biolink:occurs_together_in_literature_with",
"subject": "CHEBI:173421",
"object": "NCBIGene:55869",
"attributes": [
"attribute_type_id": "biolink:publications",
"value": [
"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
"PMCID:PMC6765066",
“https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463358/bin/fnagi-09-00176-g003.jpg”,
“PMCID:PMC5463358”
]
]
Suggested:
"predicate": "biolink:occurs_together_in_literature_with",
"subject": "CHEBI:173421",
"object": "NCBIGene:55869",
"attributes": [
"attribute_type_id": "biolink:publications",
"value": [
{ "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
"pmc": "PMC6765066",
"pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
},
...
]
]
Summary: Add structure to separate results as values
or at the level of attributes
. Also add "pfocrUrl".... Just like "Analyses".
I think some of the prior confusion was likely caused by the fact that PFOCR result augmentation (or analyses, as you call it) is completely separate from edge lookup and doesn't involve x-bte annotation, I think maybe there was some unintended conflation of the two in prior discussion? Either way, I've added your part 1 ask to #837.
BTW, result augmentation is handled by this code.
Thanks. Yes, I thought the result augmentation was done (or decided) already and was referring to it as an example of the structure and fields we'd like to see in the edge lookup as well.
Thanks @AlexanderPico, your post clarifies a lot!
So "Part 1 Analyses" will be tracked/handled in the other issue since it's also "result augmentation".
As for "Part 2 Edges"...let's discuss and track this in this issue. I had thought we were discussing this the past few months...oops. And based on these discussions, I was planning to make a change after the Translator Eel deployment to add pfocrUrl to the TRAPI edge sources section.
``` "db7467ffffbf54f21fbe335c46b06303": { "predicate": "biolink:occurs_together_in_literature_with", "subject": "CHEBI:4021", "object": "NCBIGene:208", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg", "PMCID:PMC2743241", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6412134/bin/bgy171f0001.jpg", "PMCID:PMC6412134", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218933/bin/bcr2876-2.jpg", "PMCID:PMC3218933", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7464279/bin/cells-09-01817-g007.jpg", "PMCID:PMC7464279", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209965/bin/cancers-10-00346-g003.jpg", "PMCID:PMC6209965", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3828572/bin/srep03230-f8.jpg", "PMCID:PMC3828572", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4863577/bin/jep-4-173Fig1.jpg", "PMCID:PMC4863577", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7876385/bin/fphar-11-599965-g004.jpg", "PMCID:PMC7876385", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962346/bin/nihms923919f1.jpg", "PMCID:PMC5962346" ], "value_type_id": "linkml:Uriorcurie" }, { "attribute_type_id": "biolink:knowledge_level", "value": "not_provided" }, { "attribute_type_id": "biolink:agent_type", "value": "image_processing_agent" } ], "sources": [ { "resource_id": "infores:pfocr", "resource_role": "primary_knowledge_source", "source_record_urls": [ "https://pfocr.wikipathways.org/figures/PMC2743241__nihms-104435-f0001.html", "https://pfocr.wikipathways.org/figures/PMC6412134__bgy171f0001.html", "https://pfocr.wikipathways.org/figures/PMC3218933__bcr2876-2.html", "https://pfocr.wikipathways.org/figures/PMC7464279__cells-09-01817-g007.html", "https://pfocr.wikipathways.org/figures/PMC6209965__cancers-10-00346-g003.html", "https://pfocr.wikipathways.org/figures/PMC3828572__srep03230-f8.html", "https://pfocr.wikipathways.org/figures/PMC4863577__jep-4-173Fig1.html", "https://pfocr.wikipathways.org/figures/PMC7876385__fphar-11-599965-g004.html", "https://pfocr.wikipathways.org/figures/PMC5962346__nihms923919f1.html" ] }, { "resource_id": "infores:biothings-pfocr", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:pfocr" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-pfocr" ] } ] } } }, ```
Notes to self: scattered notes in https://github.com/NCATS-Tangerine/translator-api-registry/issues/132#issuecomment-2148254833, https://github.com/biothings/biothings_explorer/issues/803#issuecomment-2148096825, https://github.com/biothings/biothings_explorer/issues/811#issuecomment-2167365852
However, I agree with your suggestion - that it'd be more useful/UI-friendly/organized to have a list of figure info objects, which each object including all info for 1 figure.
The problem is that your suggestion isn't valid TRAPI/biolink-modeling. The biolink:publications
edge-attribute has a specific format: it can be an string or array of strings, and those strings are publication CURIEs.
So we'll need to figure out a format that is TRAPI/biolink-model compliant...which may involve discussions with UI/data-modeling/TRAPI teams.
EDIT: some Slack convos happening. Our lab Slack
@AlexanderPico
I can make the change mentioned above to add pfocrUrl to the TRAPI edge sources section, now that Translator Eel is in Prod. Would you like me to do this? Or pause/drop this effort?
Yes, please! I think we'll want that long-term. Short-term, we might be stuffing this edge info into a support graph section so that the UI team can access it right away (i.e, before alt edge types are allowed).
UI team is eager to work with the edge-level pathway information being returned by BTE via PFOCR as a KP. Currently, we just have a flat list of
values
including the figureUrl and PMCID. Ideally, these would be labeled more clearly or at least returned in sets per hit. And we should also include the pfocrUrl, e.g., https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html