Rule number 1 of any dataframe library is "don't do operations by iterating over rows." However, this is exactly what we do in subgraph_sum and subtree_sum. We need to refactor this to use a better mechanism (e.g., DataFrame.apply).
To get a sense of the performance impact, I can anecdotally say that subgraph_sum is 3-4x slower than the query language. And the query language is solving a version of subgraph isomorphism, an NP Hard problem.
Rule number 1 of any dataframe library is "don't do operations by iterating over rows." However, this is exactly what we do in
subgraph_sum
andsubtree_sum
. We need to refactor this to use a better mechanism (e.g.,DataFrame.apply
).To get a sense of the performance impact, I can anecdotally say that
subgraph_sum
is 3-4x slower than the query language. And the query language is solving a version of subgraph isomorphism, an NP Hard problem.