ExposuresProvider / cam-pipeline

Data loading pipeline for CAM database
https://exposuresprovider.github.io/cam-pipeline/
MIT License
2 stars 4 forks source link

Update the URL for the wiki page returned by cam-kp #118

Closed karafecho closed 6 months ago

karafecho commented 9 months ago

Per discussion between Kara and Jim, this issue is to request that the URL for the wiki page returned by cam-kp is replaced with this one: https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG. This change will necessitate some changes to the code and a new deployment, but it will improve consistency, adhere to NCATS/UI specs for wiki pages, and better serve end users

gaurav commented 8 months ago

Am I correct in thinking that this will need to be made in the infores catalog? We currently follow the convention of all the Automat KGs, which sets all the Automat KG xrefs to https://github.com/NCATSTranslator/Translator-All/wiki/Automat, but I think we can change that to https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG without too much problem.

https://github.com/biolink/biolink-model/blob/569ecf63ae59bfd200dda8dd871ed50c2dff4345/infores_catalog.yaml#L167-L174

Our /metadata endpoint returns two URLs at https://automat.renci.org/cam-kp/metadata:

      "source_data_url": "https://github.com/ExposuresProvider/cam-kp-api",
      "license": "https://github.com/ExposuresProvider/cam-kp-api/blob/master/LICENSE",
      "attribution": "https://github.com/ExposuresProvider/cam-kp-api",

AFAIK nobody uses these anywhere, but we could consider changing attribution to this URL as well.

karafecho commented 8 months ago

To clarify, we (icees-kg) also follow the convention of deferring to Automat as an aggregator knowledge source (infores:automat-icees-kg) and point to https://github.com/NCATSTranslator/Translator-All/wiki/Automat, but we then refer to infores:icees-kg as the primary knowledge source and point to https://github.com/NCATSTranslator/Translator-All/wiki/ICEES. For cam-kp, I'm suggesting that you also refer to infores:automat-cam-kp as the aggregator knowledge source pointing to https://github.com/NCATSTranslator/Translator-All/wiki/Automat and infores:cam-kp as the primary knowledge source, pointing to https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG.

Yes, the change will need to be made in both cam-kp and the infores catalog. However, I already have them flagged as part of the infores/wiki effort, so I can create a PR for both icees-kg and cam-kp after the cam-kp URLs have been updated.

gaurav commented 8 months ago

Ah, got it, I understand what you mean now! So instead of our current sources, which looks like this:

https://github.com/ExposuresProvider/cam-pipeline/blob/c539c7463eb8b882145e2e4725f99ee8ed6e4287/tests/test_api.py#L2433-L2444

You are proposing that we add infores:cam-kp as the primary_knowledge_source below infores:automat-cam-kp as the aggregator_knowledge_source, and then demote infores:go-cam to a supporting_data_source.

I think we can do that, but I would argue that infores:cam-kp should also be an aggregator_knowledge_source -- we don't really provide any primary knowledge, and all the information we have should be sourced to one of our primary knowledge sources (infores:go-cam, infores:aop-cam and infores:ctd):

https://github.com/ExposuresProvider/cam-pipeline/blob/c539c7463eb8b882145e2e4725f99ee8ed6e4287/tests/test_api.py#L101-L106

So I would propose that we change our sources so they look like this:

[ 
     { 
         "resource_id": "infores:go-cam", 
         "resource_role": "primary_knowledge_source"
     }, 
     { 
         "resource_id": "infores:cam-kp", 
         "resource_role": "aggregator_knowledge_source", 
         "upstream_resource_ids": ["infores:go-cam"], 
     }, 
     { 
         "resource_id": "infores:automat-cam-kp", 
         "resource_role": "aggregator_knowledge_source", 
         "upstream_resource_ids": ["infores:cam-kp"], 
     }, 
 ] 

Does that make sense?

karafecho commented 8 months ago

Yes, let's go with you suggestion, but note that you may want to cross-check against the InfoRes catalog (https://github.com/biolink/biolink-model/blob/master/infores_catalog.yaml).

gaurav commented 6 months ago

@EvanDietzMorris updated Automat-CAM-KP; if you run the following query on https://automat.renci.org/#/cam-kp/reasoner_api_1_4_query_post_cam-kp_trapi:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:15481"]
                },
                "n1": {"categories": ["biolink:AnatomicalEntity"]}
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:active_in"]
                }
            }
        }
    }
}

... you get back provenance that looks like this:

"390329": {
          "predicate": "biolink:active_in",
          "sources": [
            {
              "resource_id": "infores:go-cam",
              "resource_role": "primary_knowledge_source"
            },
            {
              "resource_id": "infores:cam-kp",
              "resource_role": "aggregator_knowledge_source",
              "upstream_resource_ids": [
                "infores:go-cam"
              ]
            },
            {
              "resource_id": "infores:automat-cam-kp",
              "resource_role": "aggregator_knowledge_source",
              "upstream_resource_ids": [
                "infores:cam-kp"
              ]
            }
          ],
          "subject": "NCBIGene:15481",
          "attributes": [
            {
              "attribute_type_id": "biolink:xref",
              "original_attribute_name": "xref",
              "value": [
                "http://model.geneontology.org/SYNGO_2867"
              ],
              "value_type_id": "xsd:anyURI"
            }
          ],
          "object": "UBERON:0002894"
        }

So infores:go-cam is the primary knowledge source, which is aggregated by the aggregator knowledge source infores:cam-kp, which is itself aggregated by the aggregator knowledge source infores:automat-cam-kp.

@karafecho: Is this sufficient to close this issue, at least on the Automat-CAM-KP side? We'll have to make sure that all of those inforeses point to the right URLs, but according to https://github.com/biolink/information-resource-registry/blob/e592279814e723ca16b922111037568171b87668/infores_catalog.yaml:

So we should be good there.

karafecho commented 6 months ago

This all looks good to me! Sierra, Tursynay, and I just updated the infores catalog (created and merged a large PR with many changes), so your x-refs are up to date and look good to me. As such, I'll close this ticket.

karafecho commented 6 months ago

Oh, wait. I don't see a GO-CAM wiki page?

gaurav commented 6 months ago

Hmm, the correct wiki URL for GO-CAM should be https://github.com/NCATSTranslator/Translator-All/wiki/GO-CAM — I think maybe the dash got turned into an em-dash in https://github.com/biolink/information-resource-registry/blob/e592279814e723ca16b922111037568171b87668/infores_catalog.yaml. I can fix that in a bit.

karafecho commented 6 months ago

Yes, that is the correct xref URL for GO-CAM. I would create a PR to change the incorrect xref URL in the infores catalog.

gaurav commented 6 months ago

PR created! https://github.com/biolink/information-resource-registry/pull/7

We can close this ticket once that's been merged.