Formalize "allowlist" and "denylist"

uhbrar commented 2 years ago

I've noticed that a few ARA's allow for "allowlist" and "denylist" to be passed in as part of a query. "allowlist" tends to indicate the exclusive group of KP's from which knowledge can be retrieved such that only those KP's will be queried, while "denylist" indicates a list of KP's for which no requests should be made such that all KP's but those listed will be queried. However, as far as I can tell, the usage of these keys is not formalized anywhere in TRAPI, or at least, not anywhere I can find it.

The workflow runner will also soon be implementing this same concept, in which an optional "allowlist" or "denylist" parameter can be passed in for each operation in a workflow to allow for only certain services (ARA's and/or KP's, depending on the operation) to be queried. It would be a good idea to formalize this as part of the schema, just for consistency's sake.

edeutsch commented 2 years ago

Can you provide detailed examples of how a few ARAs are using allowlist and denylist exactly? i.e. which ARAs and exactly where in the TRAPI is it being used?

edeutsch commented 2 years ago

ARAX does not currently recognize an allowlist or denylist in TRAPI. The only similar functionality is that in the ARAXi DSL, it is possible to specify one KP each time the expand() step is called, so that provides some similar functionality. We do encourage adding some common functionality like this, but we do not really do so at present.

uhbrar commented 2 years ago

Aragorn uses the allowlist and denylist as part of another query edge property, provided_by. It's meant to specify a list of KP's to either use or not use when querying edges, where the allowlist are KP's that can be used and the denylist are KP's that should not be.

This is an example query to show how it works.

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "n0",
          "subject": "n1",
          "predicates": [
            "biolink:entity_negatively_regulates_entity"
          ],
          "provided_by":{
            "denylist": []
          }
        }
      },
      "nodes": {
        "n0": {
          "ids": [
            "NCBIGene:23221"
          ],
          "categories": [
            "biolink:Gene"
          ]
        },
        "n1": {
          "categories": [
            "biolink:Gene"
          ]
        }
      }
    }
  }
}

vdancik commented 2 years ago

No need for "allowlist" and "denylist" since TRAPI already supports constrains.

edeutsch commented 2 years ago

It is my understanding that an allowlist and denylist would be encoded with a constraint, as Vlado says. Here is what I think your query that does NOT want any information from SemMedDB should look like:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "n0",
          "subject": "n1",
          "predicates": [
            "biolink:entity_negatively_regulates_entity"
          ],
          "constraints": [
            {
              "id": "biolink:knowledge_source",
              "name": "knowledge source",
              "value": "infores:semmeddb",
              "not": true,
              "operator": "=="
            }
          ]
        }
      },
      "nodes": {
        "n0": {
          "ids": [
            "NCBIGene:23221"
          ],
          "categories": [
            "biolink:Gene"
          ]
        },
        "n1": {
          "categories": [
            "biolink:Gene"
          ]
        }
      }
    }
  }
}

Here is what a general "allowlist" should look like:

      "constraints": [
        {
          "id": "biolink:knowledge_source",
          "name": "knowledge source",
          "value": [
            "infores:rtx-kg2",
            "infores:biothings-explorer",
          ],
          "operator": "=="
        }
      ],

(when the value is a list, the "==" operator works like a SQL "IN" clause, as clearly documented in the TRAPI yaml)

Here is what a general "denylist" should look like:

      "constraints": [
        {
          "id": "biolink:knowledge_source",
          "name": "knowledge source",
          "value": [
            "infores:rtx-kg2",
            "infores:biothings-explorer",
          ],
          "not": true,
          "operator": "=="
        }
      ],

(when the value is a list, the "==" operator combined with ' "not": true ' works like a SQL "NOT IN" clause, as clearly documented in the TRAPI yaml)

Did I get that right? Please correct any mistakes in the above. I agree that we have not documented specific examples like this clearly. If we all agree on the above, we should add it to some documentation.

Note that there is some inherent uncertainty and fuzziness in biolink:knowledge_source. One might plausibly encode that above as:

"attribute_type_id": "biolink:aggregator_knowledge_source",

I would probably implement this as not drawing a distinction between a biolink:knowledge_source and a biolink:aggregator_knowledge_source. i.e. allowing or denying an entity by either attribute_type_id.

edeutsch commented 2 years ago

This is now fully documented at the bottom of https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationRules.md Closing. Reopen if there is an issue.

NCATSTranslator / ReasonerAPI

Formalize "allowlist" and "denylist" #318