frostyfan109 / tranql

A Translator Query Language
https://researchsoftwareinstitute.github.io/data-translator/apps/tranql
MIT License
0 stars 1 forks source link

usage of `expand_nodes` in tranql_ast.py fails on lists containing variables #122

Closed frostyfan109 closed 4 years ago

frostyfan109 commented 4 years ago

Problem

expand_nodes is called within SelectStatement.generate_questions to convert set statement variables into literal values. When a list containing variables is passed into a where clause, generate_questions fails to properly expand variables.

Example (important for explanation)

set $citric_acid = "CHEBI:30769"
select chemical_substance->gene->disease
  from "/graph/gamma/quick"
 where chemical_substance=[$citric_acid, "CHEBI:26836", "CHEBI:26078"]

In this example, chemical_substance should evaluate to ["CHEBI:30769", "CHEBI:26836", "CHEBI:26078"]. Obviously, it doesn't due to this bug.

What does the example evaluate to?

For where chemical_substance=[$citric_acid, "CHEBI:26836", "CHEBI:26078"], it becomes: ["CHEBI:30769"]. When the first element in the list is a variable, all the following elements are deleted.

For where chemical_substance=["CHEBI:26836", $citric_acid, "CHEBI:26078"], it becomes: ["CHEBI:26836", "$citric_acid", "CHEBI:26078"]. When the list contains a variable that is not the first element, the variable is not resolved.

Explanation

When a list is passed into a where clause, it calls concept.set_nodes(list_value). For this example, let's clarify the following:

list_value = ["$citric_acid", "CHEBI:26836"]
concept.nodes == list_value # True

Then, in generate_questions, it calls self.expand_nodes(interpreter, concept). In expand_nodes, it does value = concept.nodes[0] if len(concept.nodes) > 0 else None. Remember, expand_nodes is going to take value, check if it's a set statement variable, and, if so, replace it with its actual value.

Now, the problem with this is that if concept.set_nodes(list_value) is called, then concept.nodes is a list. So, if only the first element is checked for being a variable, then there's some obvious issues.

When the first element is a variable, then expand_nodes does concept.set_nodes(value), which overrides the rest of the elements in the list. Now, concept.nodes == ["CHEBI:30769"] ($citric_acid). We've lost "CHEBI:26836" and "CHEBI:26078".

When the first element isn't a variable, but there are variables in the list, expand_nodes only checks the first element in the list, so the variables never get resolved.

frostyfan109 commented 4 years ago

Read above issue and realized it was hard to follow. This is a rewording.

For the query:

set $citric_acid = "CHEBI:30769"
select chemical_substance->gene->disease
  from "/graph/gamma/quick"
 where chemical_substance=[$citric_acid, "CHEBI:26836", "CHEBI:26078"]

it does concept.set_nodes(["$citric_acid", "CHEBI:26836", "CHEBI:26078"]). Then, in expand_nodes, it only looks at concept.nodes[0] even though concept.nodes has 3 elements.

Greater detail can be found in original issue.

frostyfan109 commented 4 years ago

Tentatively closing this issue. Commit 752489f fixes this, but I'm unsure why I hadn't already done this beforehand. I remember creating a fix with a similar approach to how I did it in the aforementioned commit, but I had to undo it for some reason.