Record class doesn't handle qualifier value arrays in associations

tokebe commented 1 year ago

In some cases, the association object for an edge will have qualifiers with multiple values in an array. When reversing a Record made from such an association, or when falling back to the association in cases where a returned edge is missing those qualifiers, the Record will TypeError as it tries to do string operations on the array.

Similarly, because an association can have qualifier value arrays, falling back to the association qualifiers when the returned edge is missing a qualifier can cause information problems, such as when the array has both increased and decreased.

We need specific expectations for how to handle the fallback, and the code should check for string or string[] before operating on qualifier values.

Reproducible with the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e01": {
          "object": "Drug",
          "predicate": "biolink:related_to",
          "subject": "Protein"
        },
        "e02": {
          "object": "Protein",
          "predicate": "biolink:related_to",
          "subject": "BiologicalProcess"
        },
        "e03": {
          "object": "BiologicalProcess",
          "predicate": "biolink:related_to",
          "subject": "Disease"
        }
      },
      "nodes": {
        "BiologicalProcess": {
          "categories": [
            "biolink:BiologicalProcess"
          ]
        },
        "Disease": {
          "ids": [
            "MESH:D016649"
          ]
        },
        "Drug": {
          "ids": [
            "MESH:D005640"
          ]
        },
        "Protein": {
          "categories": [
            "biolink:Protein"
          ]
        }
      }
    }
  }
}

tokebe commented 1 year ago

@colleenXu, is there an expected behavior when an association has qualifier value arrays, but a returned edge querying that association has no qualifiers? This particular case isn't represented in the example query, so this is unfortunately a purely hypothetical question at the moment.

colleenXu commented 1 year ago

Err....it sounds like there's two issues happening?

1: somehow there are records/associations where a qualifier's value is an array rather than a string

I don't think these are valid statements. We could ask data-modeling / TRAPI teams for clarification...
are these coming from TRAPI KPs? I'm not sure how we could get this from x-bte...
I think it'd be helpful to see an example. My prelim thoughts are:
- if it's not clear what this record means, we could have BTE ignore/skip/drop these kinds of records. I think this is more likely...
- if these records seem to represent multiple edges/relationships, we could have BTE split each into multiple records (one for each relationship)
and it'd be helpful to trace what KP we're getting these records from, so we can ask them more about this. There may be a bug on their side

2: It's not clear to me what "association has qualifier value arrays, but a returned edge querying that association has no qualifiers" means.

Is it one of these situations?

QEdge has qualifier-constraints, but the records from sub-queries don't have qualifiers.
- if the records are coming from x-bte-based MetaEdges, then there's a bug in our code or problem with the x-bte annotation
- if the records are coming from a TRAPI KP, then there's an issue with their tool. We should let them know and maybe add code to ignore/drop these records since they don't match the qualifier-constraints
QEdge doesn't have qualifier-constraints, but the records from sub-queries have qualifiers
- this is what I thought was going on in this case?
- But then, I'm not clear on what the problem is. Is BTE having trouble with reversing qualifiers in that situation all the time? Or just the cases where qualifier values are arrays?

tokebe commented 1 year ago

I should note that by "association", I mean the association attribute coming from the smartapi-kg package, which are coming from our own internal representations of smartapi specs, not data that's coming from a KP.

tokebe commented 1 year ago

So, what I mean is, the smartapi-kg package gives us an operation, the association attribute of which specifies the input/output types, predicate, etc...and specifically to this case, gives us a qualifier_type_id that specifies an array of applicable values which the API might be able to respond with. In a (purely hypothetical, as-of-yet not observed case), a KP could respond with no qualifiers, despite this associate specifying the range of applicable qualifiers it should respond with. The KP would be in the wrong, but the code should probably be a little smarter about how it handles it

To give more insight, to my memory, there's no specific reasoning why the Record class should fall back to the association-defined qualifiers if the actual returned edge (from which we're building the Record) has no qualifiers attribute. So, we could just remove that piece of code and instead drop any would-be records which don't have a qualifier when the association specifies there should be one?

As for where we're seeing an actual error (the "When reversing a record made from such an association" part), that's just a detail of Record class reversing which is totally understood and just needs some fixing, no further discussion is needed.

colleenXu commented 1 year ago

So....it sounds like none of the scenarios I described above are really what's going on?

So...I'm going to try again at understanding the problem and asking questions:

It now sounds like 1 association (related to the MetaEdge concept?) can sometimes have 1 qualifier-type-id paired with an array of possible values - and that this is a core part of the problem. Is that right?

If that's right, that sounds kinda odd to me. I think we represent 1 qualifier-set per x-bte operation, so each qualifier-type-id is paired with 1 string value.
And I think that representation would make sense in the entire metaKG...regardless of where the association came from. What do you think?

And can you say what KPs these associations are coming from or give examples of those associations? I'm imagining that these are maybe only coming from TRAPI KPs:

related to how we ingest their /meta_knowledge_graph data (#597)?

On "going back to the association because qualifiers aren't in the record": that's why I'm trying to figure out if this is a TRAPI KP issue or not.

Yes, TRAPI KP responses could have non-empty qualifiers sections in their records (based on the Edges) - so going back to the association/MetaEdge for qualifier info maybe isn't necessary.
But for x-bte-based operations, it makes sense to me to go back to the association to assign the qualifier-set (since it's not going to be in the records).

And it sounds like the core problem IS NOT "KP response records having no qualifiers when we expected them" (purely hypothetical; I guess it's okay that I still don't quite get this) or "reversing for records"...

tokebe commented 1 year ago

Relates to this PR which affects internal metaKG parsing. It looks like this only affects TRAPI KPs, yes.

It now sounds like 1 association (related to the MetaEdge concept?) can sometimes have 1 qualifier-type-id paired with an array of possible values - and that this is a core part of the problem. Is that right?

Yes, seemingly only for TRAPI KPs based on the code/PRs I'm looking at.

And I think that representation would make sense in the entire metaKG...regardless of where the association came from. What do you think?

That was my understanding as well...one operation, one possible qualifier set.

And can you say what KPs these associations are coming from or give examples of those associations? I'm imagining that these are maybe only coming from TRAPI KPs:

An association is a purely BTE-internal abstraction generated for every single operation it parses. The problematic ones do seem to specifically be coming from TRAPI KPs, meaning either:

some TRAPI KPs are improperly representing information in their /meta_knowledge_graph endpoints, OR
BTE is improperly parsing this information when it creates association objects for operations from TRAPI endpoints, and certain qualities of specific operations are causing this improper parsing to occur.

I need to isolate which KP is associated with our actual error to try and pinpoint what it looks like when this association is being made.

On "going back to the association because qualifiers aren't in the record": that's why I'm trying to figure out if this is a TRAPI KP issue or not.

I think this fallback code is specifically there for non-TRAPI KPs, where we insert those qualifiers because, as you said, they're not in what we're getting back from the KP. In which case, we probably won't ever see this code triggered by a TRAPI KP failing to provide qualifiers, so I suppose I don't really need to handle that case.

And it sounds like the core problem IS NOT "KP response records having no qualifiers when we expected them" (purely hypothetical; I guess it's okay that I still don't quite get this) or "reversing for records"...

What remains are two things that I have to do:

Handle record reversing when qualifier values (from the association, not from the KP-returned data) are an array, because the KP is returning information just fine and we shouldn't be dropping queries over this.
Further research to figure out why we were even allowing TRAPI qualifier value arrays in our internal association objects in the first place, so we can actually understand what's going on there and if anything actually needs to be done about it.

@rjawesome, though I understand you're busy, if you get a free moment and happen to recall anything that could give us insight into that second point, it'd be greatly appreciated.

rjawesome commented 1 year ago

Further research to figure out why we were even allowing TRAPI qualifier value arrays in our internal association objects in the first place, so we can actually understand what's going on there and if anything actually needs to be done about it.

Basically, in the meta_knowledge_graph response for any TRAPI API contains an array of qualifiers. Each qualifier has the name of the qualifier and applicable values (plural). This means that the API could accept multiple values for that qualifier name, so we would need to store all of the values in the association.

I suppose if we didn't want to allow arrays then we would have to modify the code from the PR so that it splits into multiple associations for each of the applicable values for a given qualifier.

example from bte meta_knowledge_graph (currently an entry like this would be processed as one association):

     "subject": "biolink:SmallMolecule",
            "predicate": "biolink:associated_with",
            "object": "biolink:PhenotypicFeature",
            "qualifiers": [
                {
                    "qualifier_type_id": "biolink:object_aspect_qualifier",
                    "applicable_values": [
                        "likelihood"
                    ]
                },
                {
                    "qualifier_type_id": "biolink:object_direction_qualifier",
                    "applicable_values": [
                        "increased",
                        "decreased"
                    ]
                }
            ],
            "knowledge_types": [
                "lookup"
            ]

tokebe commented 1 year ago

Thank you for the clarity @rjawesome, this makes perfect sense! @colleenXu the current code works (aside from the record reversing issue) to handle this just fine, but I'm of two minds whether it should instead be changed to make one association = one qualifier value. What do you think?

colleenXu commented 1 year ago

Finding some specific examples from TRAPI KPs /meta_knowledge_graph responses:

Automat CTD - a LOT!

https://automat.test.transltr.io/ctd/1.4/meta_knowledge_graph Hmmm...so these edges aren't inverses of each other? ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "splicing", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:SmallMolecule", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "secretion", "molecular_modification", "synthesis", "transport", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:MolecularMixture", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity", "secretion", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:ChemicalEntity", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "splicing", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "synthesis", "molecular_interaction", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:ComplexMolecularMixture", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "metabolic_processing", "molecular_interaction", "uptake" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ComplexMolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "localization", "activity", "expression", "mutation_rate", "secretion", "molecular_modification", "transport", "molecular_interaction", "metabolic_processing", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Gene", "predicate": "biolink:affects", "object": "biolink:Polypeptide", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "transport", "synthesis", "metabolic_processing", "uptake", "degradation" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Polypeptide", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity", "molecular_interaction", "degradation", "secretion" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat Pharos

https://automat.test.transltr.io/pharos/1.4/meta_knowledge_graph ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat Hetio

However, many of these seem outside the chem-gene/gene-gene scope of qualifiers right now or don't quite seem to make sense https://automat.test.transltr.io/hetio/1.4/meta_knowledge_graph ``` "subject": "biolink:AnatomicalEntity", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:GrossAnatomicalStructure", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:Disease", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:regulates", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "upregulated", "downregulated" ] } ], ```

Automat drug-central

https://automat.test.transltr.io/drugcentral/1.4/meta_knowledge_graph ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:Protein", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Protein", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat gtex

https://automat.test.transltr.io/gtex/1.4/meta_knowledge_graph ``` "subject": "biolink:SequenceVariant", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "splicing", "expression" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

Automat gtopdb

https://automat.test.transltr.io/gtopdb/1.4/meta_knowledge_graph ``` "subject": "biolink:SmallMolecule", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:ChemicalEntity", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ``` ``` "subject": "biolink:MolecularMixture", "predicate": "biolink:affects", "object": "biolink:Gene", "qualifiers": [ { "qualifier_type_id": "object_aspect_qualifier", "applicable_values": [ "activity" ] }, { "qualifier_type_id": "object_direction_qualifier", "applicable_values": [ "decreased", "increased" ] }, { "qualifier_type_id": "qualified_predicate", "applicable_values": [ "biolink:causes" ] } ], ```

tokebe commented 1 year ago

From meetings: Moving forward with changing code to ensure only one value per internal association object.

colleenXu commented 1 year ago

From what I recall (@tokebe, please comment with agreement or clarification):

@tokebe will implement after Typescript migration
we decided that when an "association" (internal representation) has qualifiers, this is a qualifier-set (each type has 1 value)
In TRAPI KP /meta_knowledge_graph responses, there are TRAPI MetaEdges where there's > 1 value per qualifier-type. We can parse this, build a set of possible "qualifier-sets", and make individual "associations" for each "qualifier-set".
- I think we decided to include the "absence of >=1 type-id" in our enumeration of qualifier-set combos...but I'm not sure. See the note below on why we'd want to do this
- However, this can potentially mean different TRAPI KP sub-queries retrieving the same records....can BTE handle this okay?

A complication to building these "qualifier-sets" is what to do about the potential absence of qualifiers or specific type-ids. For example, what if a TRAPI KP:

has some useful records under "Chem affects Gene" (no qualifiers)...but the TRAPI MetaEdge has qualifiers since other records have them
has some useful records under "Chem affects Gene, activity_or_abundance" (1 aspect-qualifier, no direction qualifier)...but the TRAPI MetaEdge has direction-qualifiers because other records have them

colleenXu commented 11 months ago

@tokebe Linked to this PR https://github.com/biothings/api-respone-transform.js/pull/60? Just in case, I put both this issue and the PR into the Project manager/deployment section...

tokebe commented 8 months ago

Relevant changes deployed to Prod.

biothings / biothings_explorer

Record class doesn't handle qualifier value arrays in associations #713