biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Values returned by PFOCR as KP #838

Open AlexanderPico opened 2 months ago

AlexanderPico commented 2 months ago

UI team is eager to work with the edge-level pathway information being returned by BTE via PFOCR as a KP. Currently, we just have a flat list of values including the figureUrl and PMCID. Ideally, these would be labeled more clearly or at least returned in sets per hit. And we should also include the pfocrUrl, e.g., https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html

        "predicate": "biolink:occurs_together_in_literature_with",
        "subject": "CHEBI:173421",
        "object": "NCBIGene:55869",
        "attributes": [
                "attribute_type_id": "biolink:publications",
                "value": [
                    "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
                    "PMCID:PMC6765066",
            “https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463358/bin/fnagi-09-00176-g003.jpg”,
                        “PMCID:PMC5463358”
                ]
        ]
AlexanderPico commented 2 months ago

To avoid confusion, this is distinct from the analyses returns from PFOCR-based enrichment., which were recently upgraded to include pfocrURL and have a nice structure. Basically, can we do the same upgrade for the edge-level returns?

newgene commented 2 months ago

@everaldorodrigo put this to your plate. Feel free to create a new issue at pending.api repo and point to this one at the bte repo.

colleenXu commented 2 months ago

Wait....I'm seeing multiple confusing points. Maybe some clarification would be useful?

  1. The first ask seems to be "adjust the info format in edges". This sounds to me like a x-bte annotation/BTE-post-subquery-processing task, not a Pending BioThings API task...
    • I'm also unclear on what the desired format is. It sounds like you'd like all the info for one figure kept together in 1 object (figureUrl, PMC, pfocrUrl), and then having separate objects for each figure? (I'm also need to think about how to put this in TRAPI format in edge-attributes/sources)
  2. The second ask is "adding pfocrUrl to TRAPI edge info". I was covering this in https://github.com/NCATS-Tangerine/translator-api-registry/issues/132#issuecomment-2148254833 and adding pfocrUrl to the TRAPI edge sources section. We should be able to get this done later this week, after Translator Eel Prod deployment.
  3. I don't think PFOCR-based enrichment (result augmentation) has been updated recently and I don't think it has pfocrUrl or any updated structure. This result augmentation has nothing to do with x-bte annotation and PFOCR TRAPI-edge format - which were discussed recently (point 2). I think there's been some crossed-wires/confusion here...
AlexanderPico commented 2 months ago

Thanks @colleenXu. I will be the first to admit confusion. This is still not totally clear to me, so I'll rephrase my ask from scratch based on what I see today and what I hope to see.

I see these two types of JSON snippets in TRAPI results containing PFOCR content, which I'm going to label Edge and Analyses to distinguish the two distinct parts of the TRAPI result. And I'll include Current and Suggested examples with a Summary of the diff...

1. Analyses Current:

"pfocr": [
        {"matchedCuries": [
                 "NCBIGene:8445",
                 "NCBIGene:1859",
                 "NCBIGene:2932",
                 "NCBIGene:2735"
          ],
          "score": 0.2352941176470588,
    "pmc": "PMC2743241",
    "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg"
         },

Suggested:

"pfocr": [
        {"matchedCuries": [
                 "NCBIGene:8445",
                 "NCBIGene:1859",
                 "NCBIGene:2932",
                 "NCBIGene:2735"
          ],
          "score": 0.2352941176470588,
    "pmc": "PMC2743241",
    "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg",
    "pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
         },

Summary: addition of link to PFOCR website called "pfocrUrl" or whatever you like. I thought this was what we've been discussing for past few months and maybe it's already done?

2. Edge Current:

        "predicate": "biolink:occurs_together_in_literature_with",
        "subject": "CHEBI:173421",
        "object": "NCBIGene:55869",
        "attributes": [
                "attribute_type_id": "biolink:publications",
                "value": [
                    "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
                    "PMCID:PMC6765066",
            “https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5463358/bin/fnagi-09-00176-g003.jpg”,
                        “PMCID:PMC5463358”
                ]
        ]

Suggested:

        "predicate": "biolink:occurs_together_in_literature_with",
        "subject": "CHEBI:173421",
        "object": "NCBIGene:55869",
        "attributes": [
                "attribute_type_id": "biolink:publications",
                "value": [
                      { "figureUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765066/bin/fig-13.jpg",
                        "pmc": "PMC6765066",
                            "pfocrUrl": "https://pfocr.wikipathways.org/figures/PMC5463358__fnagi-09-00176-g003.html"
                          },
                          ...
                ]
    ]

Summary: Add structure to separate results as values or at the level of attributes. Also add "pfocrUrl".... Just like "Analyses".

tokebe commented 2 months ago

I think some of the prior confusion was likely caused by the fact that PFOCR result augmentation (or analyses, as you call it) is completely separate from edge lookup and doesn't involve x-bte annotation, I think maybe there was some unintended conflation of the two in prior discussion? Either way, I've added your part 1 ask to #837.

BTW, result augmentation is handled by this code.

AlexanderPico commented 2 months ago

Thanks. Yes, I thought the result augmentation was done (or decided) already and was referring to it as an example of the structure and fields we'd like to see in the edge lookup as well.

colleenXu commented 2 months ago

Thanks @AlexanderPico, your post clarifies a lot!

So "Part 1 Analyses" will be tracked/handled in the other issue since it's also "result augmentation".

As for "Part 2 Edges"...let's discuss and track this in this issue. I had thought we were discussing this the past few months...oops. And based on these discussions, I was planning to make a change after the Translator Eel deployment to add pfocrUrl to the TRAPI edge sources section.

It'd then look like this (click to expand)

``` "db7467ffffbf54f21fbe335c46b06303": { "predicate": "biolink:occurs_together_in_literature_with", "subject": "CHEBI:4021", "object": "NCBIGene:208", "attributes": [ { "attribute_type_id": "biolink:publications", "value": [ "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743241/bin/nihms-104435-f0001.jpg", "PMCID:PMC2743241", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6412134/bin/bgy171f0001.jpg", "PMCID:PMC6412134", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218933/bin/bcr2876-2.jpg", "PMCID:PMC3218933", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7464279/bin/cells-09-01817-g007.jpg", "PMCID:PMC7464279", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209965/bin/cancers-10-00346-g003.jpg", "PMCID:PMC6209965", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3828572/bin/srep03230-f8.jpg", "PMCID:PMC3828572", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4863577/bin/jep-4-173Fig1.jpg", "PMCID:PMC4863577", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7876385/bin/fphar-11-599965-g004.jpg", "PMCID:PMC7876385", "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962346/bin/nihms923919f1.jpg", "PMCID:PMC5962346" ], "value_type_id": "linkml:Uriorcurie" }, { "attribute_type_id": "biolink:knowledge_level", "value": "not_provided" }, { "attribute_type_id": "biolink:agent_type", "value": "image_processing_agent" } ], "sources": [ { "resource_id": "infores:pfocr", "resource_role": "primary_knowledge_source", "source_record_urls": [ "https://pfocr.wikipathways.org/figures/PMC2743241__nihms-104435-f0001.html", "https://pfocr.wikipathways.org/figures/PMC6412134__bgy171f0001.html", "https://pfocr.wikipathways.org/figures/PMC3218933__bcr2876-2.html", "https://pfocr.wikipathways.org/figures/PMC7464279__cells-09-01817-g007.html", "https://pfocr.wikipathways.org/figures/PMC6209965__cancers-10-00346-g003.html", "https://pfocr.wikipathways.org/figures/PMC3828572__srep03230-f8.html", "https://pfocr.wikipathways.org/figures/PMC4863577__jep-4-173Fig1.html", "https://pfocr.wikipathways.org/figures/PMC7876385__fphar-11-599965-g004.html", "https://pfocr.wikipathways.org/figures/PMC5962346__nihms923919f1.html" ] }, { "resource_id": "infores:biothings-pfocr", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:pfocr" ] }, { "resource_id": "infores:service-provider-trapi", "resource_role": "aggregator_knowledge_source", "upstream_resource_ids": [ "infores:biothings-pfocr" ] } ] } } }, ```

Notes to self: scattered notes in https://github.com/NCATS-Tangerine/translator-api-registry/issues/132#issuecomment-2148254833, https://github.com/biothings/biothings_explorer/issues/803#issuecomment-2148096825, https://github.com/biothings/biothings_explorer/issues/811#issuecomment-2167365852


However, I agree with your suggestion - that it'd be more useful/UI-friendly/organized to have a list of figure info objects, which each object including all info for 1 figure.

The problem is that your suggestion isn't valid TRAPI/biolink-modeling. The biolink:publications edge-attribute has a specific format: it can be an string or array of strings, and those strings are publication CURIEs.

So we'll need to figure out a format that is TRAPI/biolink-model compliant...which may involve discussions with UI/data-modeling/TRAPI teams.

EDIT: some Slack convos happening. Our lab Slack

colleenXu commented 2 months ago

@AlexanderPico

I can make the change mentioned above to add pfocrUrl to the TRAPI edge sources section, now that Translator Eel is in Prod. Would you like me to do this? Or pause/drop this effort?

AlexanderPico commented 2 months ago

Yes, please! I think we'll want that long-term. Short-term, we might be stuffing this edge info into a support graph section so that the UI team can access it right away (i.e, before alt edge types are allowed).

colleenXu commented 2 months ago

Okay, the minor change mentioned above should be live tomorrow (8/1). merged PR.