replacing lark visitors with networkx graphs

ArnonHevlin commented 4 years ago

In our last meeting we raised the idea of replacing lark visitors with networkx graphs (for semantic check passes).

As I see it, lark passes provide a very comfortable way for parsing the tree, so if I want to use netx instead, I basically need to copy lark's functionality and make it work with netx graphs. To me it seems that it won't really save time for a new developer, as he will have to learn lark's functionality (ported to netx) regardless.

Also, the structure of lark trees is less generic than a networkx graph, making it somewhat more comfortable to parse with even after I'll implement lark's functionality with networkx.

Seeing this, do you think moving to networkx for semantic passes is really worth it?

DeanLight commented 4 years ago

Can you explain which functionality you need from lark and why?

ArnonHevlin commented 4 years ago

lark visitor (and transformer and interpreter) allows us to define a function for each rule in our grammar. When called, the visitor traverses the tree and for each node in the tree, the visitor identifies the rule that defined this node, and calls the appropriate method that we defined (should it exist) to deal with this node. This minimizes the code that you need to write per pass and it results in a very organized code. It also hides the tree traversal and other 'messier' code in the parent class, allowing us to focus solely on the pass's purpose.

For example, here is the pass that I wrote for checking referenced variables

class CheckReferencedVariablesInterpreter(Interpreter):

    def __init__(self):
        super().__init__()
        self.vars = set()

    def __add_var_name_to_vars(self, var_name_node):
        assert_correct_node(var_name_node, "var_name", 1)
        var_name = var_name_node.children[0]
        self.vars.add(var_name)

    def __check_if_var_not_defined(self, var_name_node):
        assert_correct_node(var_name_node, "var_name", 1)
        var_name = var_name_node.children[0]
        if var_name not in self.vars:
            raise NameError("variable " + var_name + " is not defined")

    def __check_if_vars_in_list_not_defined(self, tree):
        assert tree.data in NODES_OF_LIST_WITH_VAR_NAMES
        for child in tree.children:
            if child.data == "var_name":
                self.__check_if_var_not_defined(child)

    def assign_literal_string(self, tree):
        assert_correct_node(tree, "assign_literal_string", 2, "var_name", "string")
        self.__add_var_name_to_vars(tree.children[0])

    def assign_string_from_file_string_param(self, tree):
        assert_correct_node(tree, "assign_string_from_file_string_param", 2, "var_name", "string")
        self.__add_var_name_to_vars(tree.children[0])

    def assign_string_from_file_var_param(self, tree):
        assert_correct_node(tree, "assign_string_from_file_var_param", 2, "var_name", "var_name")
        self.__check_if_var_not_defined(tree.children[1])
        self.__add_var_name_to_vars(tree.children[0])

    def assign_span(self, tree):
        assert_correct_node(tree, "assign_span", 2, "var_name", "span")
        self.__add_var_name_to_vars(tree.children[0])

    def assign_int(self, tree):
        assert_correct_node(tree, "assign_int", 2, "var_name", "integer")
        self.__add_var_name_to_vars(tree.children[0])

    def assign_var(self, tree):
        assert_correct_node(tree, "assign_var", 2, "var_name", "var_name")
        self.__check_if_var_not_defined(tree.children[1])
        self.__add_var_name_to_vars(tree.children[0])

    def relation(self, tree):
        assert_correct_node(tree, "relation", 2, "relation_name", "term_list")
        self.__check_if_vars_in_list_not_defined(tree.children[1])

    def fact(self, tree):
        assert_correct_node(tree, "fact", 2, "relation_name", "fact_term_list")
        self.__check_if_vars_in_list_not_defined(tree.children[1])

    def rgx_ie_relation(self, tree):
        assert_correct_node(tree, "rgx_ie_relation", 3, "term_list", "term_list", "var_name")
        self.__check_if_vars_in_list_not_defined(tree.children[0])
        self.__check_if_vars_in_list_not_defined(tree.children[1])
        self.__check_if_var_not_defined(tree.children[2])

    def func_ie_relation(self, tree):
        assert_correct_node(tree, "func_ie_relation", 3, "function_name", "term_list", "term_list")
        self.__check_if_vars_in_list_not_defined(tree.children[1])
        self.__check_if_vars_in_list_not_defined(tree.children[2])

As you can see, this code strictly defines the pass, and does nothing else, which in my opinion is the ideal way to define a pass.

DeanLight commented 4 years ago

Ok i have ta few comments/question.

If i understand correctly, lark does not give you easy ways to do arbitrary walks on the graphs (it gives you the root and then you can do what you want there yourself). Is this right?
What are the numbers in the assert_code_node functions. What is this function in general?
For your case where you used an Interpreter (so walking preorder on the graph rather than arbitrary traversal like with visitor). isn't this just like the following:
```
import networkx as nx
```

def Semantic_pass(Tree,Engine,...): for node_id in nx.dfs_preorder_nodes(Tree): node = Tree[node_id] if node["type"] is "assign_literal_string": ...

  if node["type"] is .....

  ....



I agree that the fact that the automatic matching might be nice. And the fact that it formats the rules is also nice. But im not sure if this play with inheritance will still be useful when we talk about passes in which are significantly further down from the parsing and might have different types of nodes. In fact, this implicit function calling based on node names based on inheritance might be confusing in the more difficult cases.

4. Why do you add variables to a symbol table in a pass that is supposed to check if they are referenced? shouldnt we add varaibles to a symbol table in the execution once we have data to associate to them?

ArnonHevlin commented 4 years ago

I'm not sure if this is correct, as lark interpreters supposedly offer full control when traversing the trees (I did not explore this feature), and you could also recursively call the .visit() method on nodes in the tree while using visitors.
assert_correct_node asserts that the referenced node structure is the same as mentioned in the function parameters. the number is the number of the children of said node (sometimes said children have no labels, for example a child of var_name, that's why the length is necessary). I use assert_correct_node to identify parts of the codes that I need to change once the grammar is changed, we could remove this function at the end of the development should you require it, as it does nothing but assertions.
Yes this is a similar functionality.
This symbol table is a local symbol table separated from the engine, its purpose is to save variables that were defined in the current tree in order to be able to check if a variable was not defined before being referenced. As there could also be defined variables saved in the engine's symbol table, my plan was to also check if a variable was defined there (once the engine is ready to be used).

So if I understand correctly, you still want to use networkx instead of lark visitors.

Daniel offered another alternative: We could build two tree converters:

a converter from a lark tree to a netx graph
a converter from a netx graph to a lark tree

This will allow us to build our passes (which are simple) in lark, and should someone want to do something more complex in the future, he will be able to use the convertors in order to use his own networkx passes. What do you think of this proposal?

DeanLight commented 4 years ago

I like Daniel's idea. This also makes sense since the passes that are easier to write with lark are more likely earlier up the stack and vise versa for passes that are easier with nx.

It would also enable us a lot of flexibility in development.

Lets go with that.

DeanLight / spannerlib

replacing lark visitors with networkx graphs #16