biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Record class doesn't handle qualifier value arrays in associations #713

Closed tokebe closed 8 months ago

tokebe commented 1 year ago

In some cases, the association object for an edge will have qualifiers with multiple values in an array. When reversing a Record made from such an association, or when falling back to the association in cases where a returned edge is missing those qualifiers, the Record will TypeError as it tries to do string operations on the array.

Similarly, because an association can have qualifier value arrays, falling back to the association qualifiers when the returned edge is missing a qualifier can cause information problems, such as when the array has both increased and decreased.

We need specific expectations for how to handle the fallback, and the code should check for string or string[] before operating on qualifier values.

Reproducible with the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "Drug",
          "predicate": "biolink:related_to",
          "subject": "Protein"
        },
        "e02": {
          "object": "Protein",
          "predicate": "biolink:related_to",
          "subject": "BiologicalProcess"
        },
        "e03": {
          "object": "BiologicalProcess",
          "predicate": "biolink:related_to",
          "subject": "Disease"
        }
      },
      "nodes": {
        "BiologicalProcess": {
          "categories": [
            "biolink:BiologicalProcess"
          ]
        },
        "Disease": {
          "ids": [
            "MESH:D016649"
          ]
        },
        "Drug": {
          "ids": [
            "MESH:D005640"
          ]
        },
        "Protein": {
          "categories": [
            "biolink:Protein"
          ]
        }
      }
    }
  }
}
tokebe commented 1 year ago

@colleenXu, is there an expected behavior when an association has qualifier value arrays, but a returned edge querying that association has no qualifiers? This particular case isn't represented in the example query, so this is unfortunately a purely hypothetical question at the moment.

colleenXu commented 1 year ago

Err....it sounds like there's two issues happening?

1: somehow there are records/associations where a qualifier's value is an array rather than a string


2: It's not clear to me what "association has qualifier value arrays, but a returned edge querying that association has no qualifiers" means.

Is it one of these situations?

tokebe commented 1 year ago

I should note that by "association", I mean the association attribute coming from the smartapi-kg package, which are coming from our own internal representations of smartapi specs, not data that's coming from a KP.

tokebe commented 1 year ago

So, what I mean is, the smartapi-kg package gives us an operation, the association attribute of which specifies the input/output types, predicate, etc...and specifically to this case, gives us a qualifier_type_id that specifies an array of applicable values which the API might be able to respond with. In a (purely hypothetical, as-of-yet not observed case), a KP could respond with no qualifiers, despite this associate specifying the range of applicable qualifiers it should respond with. The KP would be in the wrong, but the code should probably be a little smarter about how it handles it

To give more insight, to my memory, there's no specific reasoning why the Record class should fall back to the association-defined qualifiers if the actual returned edge (from which we're building the Record) has no qualifiers attribute. So, we could just remove that piece of code and instead drop any would-be records which don't have a qualifier when the association specifies there should be one?

As for where we're seeing an actual error (the "When reversing a record made from such an association" part), that's just a detail of Record class reversing which is totally understood and just needs some fixing, no further discussion is needed.

colleenXu commented 1 year ago

So....it sounds like none of the scenarios I described above are really what's going on?

So...I'm going to try again at understanding the problem and asking questions:


It now sounds like 1 association (related to the MetaEdge concept?) can sometimes have 1 qualifier-type-id paired with an array of possible values - and that this is a core part of the problem. Is that right?

And can you say what KPs these associations are coming from or give examples of those associations? I'm imagining that these are maybe only coming from TRAPI KPs:

On "going back to the association because qualifiers aren't in the record": that's why I'm trying to figure out if this is a TRAPI KP issue or not.


And it sounds like the core problem IS NOT "KP response records having no qualifiers when we expected them" (purely hypothetical; I guess it's okay that I still don't quite get this) or "reversing for records"...

tokebe commented 1 year ago

Relates to this PR which affects internal metaKG parsing. It looks like this only affects TRAPI KPs, yes.

It now sounds like 1 association (related to the MetaEdge concept?) can sometimes have 1 qualifier-type-id paired with an array of possible values - and that this is a core part of the problem. Is that right?

Yes, seemingly only for TRAPI KPs based on the code/PRs I'm looking at.

And I think that representation would make sense in the entire metaKG...regardless of where the association came from. What do you think?

That was my understanding as well...one operation, one possible qualifier set.

And can you say what KPs these associations are coming from or give examples of those associations? I'm imagining that these are maybe only coming from TRAPI KPs:

An association is a purely BTE-internal abstraction generated for every single operation it parses. The problematic ones do seem to specifically be coming from TRAPI KPs, meaning either:

I need to isolate which KP is associated with our actual error to try and pinpoint what it looks like when this association is being made.

On "going back to the association because qualifiers aren't in the record": that's why I'm trying to figure out if this is a TRAPI KP issue or not.

I think this fallback code is specifically there for non-TRAPI KPs, where we insert those qualifiers because, as you said, they're not in what we're getting back from the KP. In which case, we probably won't ever see this code triggered by a TRAPI KP failing to provide qualifiers, so I suppose I don't really need to handle that case.


And it sounds like the core problem IS NOT "KP response records having no qualifiers when we expected them" (purely hypothetical; I guess it's okay that I still don't quite get this) or "reversing for records"...

What remains are two things that I have to do:

  1. Handle record reversing when qualifier values (from the association, not from the KP-returned data) are an array, because the KP is returning information just fine and we shouldn't be dropping queries over this.
  2. Further research to figure out why we were even allowing TRAPI qualifier value arrays in our internal association objects in the first place, so we can actually understand what's going on there and if anything actually needs to be done about it.

@rjawesome, though I understand you're busy, if you get a free moment and happen to recall anything that could give us insight into that second point, it'd be greatly appreciated.

rjawesome commented 1 year ago

Further research to figure out why we were even allowing TRAPI qualifier value arrays in our internal association objects in the first place, so we can actually understand what's going on there and if anything actually needs to be done about it.

Basically, in the meta_knowledge_graph response for any TRAPI API contains an array of qualifiers. Each qualifier has the name of the qualifier and applicable values (plural). This means that the API could accept multiple values for that qualifier name, so we would need to store all of the values in the association.

I suppose if we didn't want to allow arrays then we would have to modify the code from the PR so that it splits into multiple associations for each of the applicable values for a given qualifier.

example from bte meta_knowledge_graph (currently an entry like this would be processed as one association):

     "subject": "biolink:SmallMolecule",
            "predicate": "biolink:associated_with",
            "object": "biolink:PhenotypicFeature",
            "qualifiers": [
                {
                    "qualifier_type_id": "biolink:object_aspect_qualifier",
                    "applicable_values": [
                        "likelihood"
                    ]
                },
                {
                    "qualifier_type_id": "biolink:object_direction_qualifier",
                    "applicable_values": [
                        "increased",
                        "decreased"
                    ]
                }
            ],
            "knowledge_types": [
                "lookup"
            ]        
tokebe commented 1 year ago

Thank you for the clarity @rjawesome, this makes perfect sense! @colleenXu the current code works (aside from the record reversing issue) to handle this just fine, but I'm of two minds whether it should instead be changed to make one association = one qualifier value. What do you think?

colleenXu commented 1 year ago

Finding some specific examples from TRAPI KPs /meta_knowledge_graph responses:

Automat CTD - a LOT!

https://automat.test.transltr.io/ctd/1.4/meta_knowledge_graph Hmmm...so these edges aren't inverses of each other? ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "splicing", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:SmallMolecule", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "secretion", "molecular_modification", "synthesis", "transport", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:MolecularMixture", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity", "secretion", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:ChemicalEntity", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "splicing", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:ComplexMolecularMixture", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "metabolic_processing", "molecular_interaction", "uptake" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ComplexMolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "molecular_interaction", "metabolic_processing", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:Polypeptide", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "transport", "synthesis", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Polypeptide", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity", "molecular_interaction", "degradation", "secretion" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat Pharos

https://automat.test.transltr.io/pharos/1.4/meta_knowledge_graph ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat Hetio

However, many of these seem outside the chem-gene/gene-gene scope of qualifiers right now or don't quite seem to make sense https://automat.test.transltr.io/hetio/1.4/meta_knowledge_graph ``` "subject": "biolink:AnatomicalEntity", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:GrossAnatomicalStructure", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:Disease", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ```

Automat drug-central

https://automat.test.transltr.io/drugcentral/1.4/meta_knowledge_graph ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Protein", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat gtex

https://automat.test.transltr.io/gtex/1.4/meta_knowledge_graph ``` "subject": "biolink:SequenceVariant", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "splicing", "expression" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat gtopdb

https://automat.test.transltr.io/gtopdb/1.4/meta_knowledge_graph ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

tokebe commented 1 year ago

From meetings: Moving forward with changing code to ensure only one value per internal association object.

colleenXu commented 1 year ago

From what I recall (@tokebe, please comment with agreement or clarification):


A complication to building these "qualifier-sets" is what to do about the potential absence of qualifiers or specific type-ids. For example, what if a TRAPI KP:

colleenXu commented 11 months ago

@tokebe Linked to this PR https://github.com/biothings/api-respone-transform.js/pull/60? Just in case, I put both this issue and the PR into the Project manager/deployment section...

tokebe commented 8 months ago

Relevant changes deployed to Prod.