aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
435 stars 188 forks source link

`QueryBuilder`: using `edge_filters` when joining with `with_ancestors` raises #4305

Open sphuber opened 4 years ago

sphuber commented 4 years ago

Code snippet that will trigger it:

from aiida import orm
from aiida.common.links import LinkType

data = orm.Data().store()
calc = orm.CalculationNode().store()

data.add_incoming(calc, link_type=LinkType.CREATE, link_label='label')

builder = orm.QueryBuilder()
builder.append(orm.CalculationNode, tag='calculation')
builder.append(orm.Data, with_ancestors='calculation', edge_filters={'type': LinkType.CREATE.value})

builder.count()

the exception that is raised:

KeyError                                  Traceback (most recent call last)
~/.virtualenvs/aiida_dev/lib/python3.7/site-packages/sqlalchemy/util/_collections.py in __getattr__(self, key)
    209         try:
--> 210             return self._data[key]
    211         except KeyError:

KeyError: 'type'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
~/code/aiida/env/dev/aiida-core/aiida/orm/implementation/querybuilder.py in get_column(self, colname, alias)
    400         try:
--> 401             return getattr(alias, colname)
    402         except AttributeError:

~/.virtualenvs/aiida_dev/lib/python3.7/site-packages/sqlalchemy/util/_collections.py in __getattr__(self, key)
    211         except KeyError:
--> 212             raise AttributeError(key)
    213 

AttributeError: type

During handling of the above exception, another exception occurred:

InputValidationError                      Traceback (most recent call last)
<ipython-input-1-cb9beeb4dffe> in <module>
     11 builder.append(orm.Data, with_ancestors='calculation', edge_filters={'type': LinkType.CREATE.value})
     12 
---> 13 builder.count()

~/code/aiida/env/dev/aiida-core/aiida/orm/querybuilder.py in count(self)
   2204         :returns: the number of rows as an integer
   2205         """
-> 2206         query = self.get_query()
   2207         return self._impl.count(query)
   2208 

~/code/aiida/env/dev/aiida-core/aiida/orm/querybuilder.py in get_query(self)
   2100 
   2101         if need_to_build:
-> 2102             query = self._build()
   2103             self._hash = queryhelp_hash
   2104         else:

~/code/aiida/env/dev/aiida-core/aiida/orm/querybuilder.py in _build(self)
   1950                     'The tags I know are:\n{}'.format(tag, self.tag_to_alias_map.keys())
   1951                 )
-> 1952             self._query = self._query.filter(self._build_filters(alias, filter_specs))
   1953 
   1954         ######################### PROJECTIONS ##########################

~/code/aiida/env/dev/aiida-core/aiida/orm/querybuilder.py in _build_filters(self, alias, filter_spec)
   1373                 is_attribute = (attr_key or column_name in ('attributes', 'extras'))
   1374                 try:
-> 1375                     column = self._impl.get_column(column_name, alias)
   1376                 except InputValidationError:
   1377                     if is_attribute:

~/code/aiida/env/dev/aiida-core/aiida/orm/implementation/querybuilder.py in get_column(self, colname, alias)
    405                 'Valid columns are:\n'.format(
    406                     colname,
--> 407                     alias  # pylint: disable=protected-access
    408                 )
    409             )

InputValidationError: type is not a column of ['%(140371970688720 anon)s.ancestor_id', '%(140371970688720 anon)s.descendant_id', 'anon_1.depth']

Note that I had to adapt the exception message of get_column itself, because it would also try to address an attribute of the alias variable, but the whole problem is that alias is not an instance of Alias but of sqlalchemy.sql.base.ImmutableColumnCollection.

@lekah I tried looking into this, but could really use your help here.

lekah commented 4 years ago

Oh come on, the error message could not be clearer ;) type is not a valid attribute to query by (=column) for this relationship. You are filtering for the type of an ancestor-descendant relationship to have a certain value. However, the type is not defined for this relationship (it is defined for links). Therefore, filtering by that column has to raise an error. The problem is not using edge_filters as a keyword, but the argument you passed. IMO (and if I understood correctly) this is not a bug, but the expected behavior (of course the error message could be improved).

sphuber commented 4 years ago

Oh come on, the error message could not be clearer ;) type is not a valid attribute to query by (=column) for this relationship. You are filtering for the type of an ancestor-descendant relationship to have a certain value. However, the type is not defined for this relationship (it is defined for links). Therefore, filtering by that column has to raise an error. The problem is not using edge_filters as a keyword, but the argument you passed. IMO (and if I understood correctly) this is not a bug, but the expected behavior (of course the error message could be improved).

I understand that that is the problem, where the exception is raised and what it is saying. However, the question is whether it should raise for the query that I defined. The query:

builder = orm.QueryBuilder()
builder.append(orm.CalculationNode, tag='calculation')
builder.append(orm.Data, with_ancestors='calculation', edge_filters={'type': LinkType.CREATE.value})

to me makes perfect sense. It says:

get all Data nodes that have a CalculationNode as an ancestor that is reachable by following CREATE links exclusively

for this trivial example, using with_ancestors does not make a lot of sense as one could just use with_incoming for the same effect, since the only link between CalculationNode and Data is a CREATE anyway. However, one could also think about a query like:

builder = orm.QueryBuilder()
builder.append(orm.CalculationNode, tag='calculation')
builder.append(orm.Data, with_ancestors='calculation', edge_filters={'type': {'in': [LinkType.CREATE.value, LinkType.INPUT_CALC.value]})

These queries make sense right? Or am I missing something?

The reason I ran into this is because in the scope of PR #4306 we are thinking of adding the with_ancestors_extended keyword that would also follow INPUT_WORK and the CALL_CALC/CALL_WORK links. For advanced usage, it might be useful to be able to restrict the set of links that are followed, for example do follow CALL_CALC but do not follow CALL_WORK. I felt the most natural way in terms of API was to use the edge_filters for this. But I found out that this is not supported even for the existing with_ancestors.

lekah commented 4 years ago

In principle, it would make sense to augment the links that are traversed, but this will be tough. To explain: the ancestor/descendant relationship is queried via a common table expression (CTE), which creates a new table (in memory) on demand, containing ancestor_id and descendant_id (you can also ask for the traversed nodes to be put in a list, that would be another column, not done by default because the query-time goes up x10). This table does not contain info over which types of links were traversed. And that CTE is hardcoded with the types of links that are allowed, so if you want the user to define it you'd have to implement this. Caution however: if you query something with a cycle you end up in an infinite loop.