Zellic / solidity-parser

Solp is a Python library used for reading, parsing and analysis of Solidity source projects and contracts without a dependency on the solc compiler.
https://solp.zellic.io
GNU Affero General Public License v3.0
53 stars 2 forks source link

Zellic Solidity Parser (solp)

Tests

Project Links

Description

Solp is a Python library used for reading, parsing and analysis Solidity source projects and contracts without having to use the solc compiler. This is done by having different grammars for different versions of Solidity, transforming them into a common AST and then further refining that AST into more specialised forms of IR for analysis. The resulting ASTs and IRs are easily usable by consumer applications without any additional dependencies.

Goals

The goals of this project are to:

Status

Currently SOLP is only used in our internal tools.

Project Structure

Setup

setup.py generates the antlr Python grammar stubs, builds and installs solp as solidity-parser with pip install .

For development setup install with pip install -e .

Usage

Solp is not a standalone application so has no entry point to speak of. Example snippets and currently run configurations for development are in example/ but these aren't application usecases themselves.

The example code in example/quickstart.py enables you to load in a Solidity project and generate AST2 parse trees. These can then be used for analysis.

Check out the Get Started user guides for more information.

How it works

The idea is to get AST2 parse trees. ANTLR and AST1 parse trees don't contain enough information in the nodes to be useful on their own(e.g. imports, using statements, function calls, and more don't get resolved). To build AST2 parse trees you take AST1 parse trees, generate symbol information and then pass both into the AST2 builder. This gives you "linked" AST2 nodes, i.e. relationship program information is embedded into the nodes themselves.

For example, with the ANTLR grammar for 0.8.22 a library call such as myVariable = adder.add(myVariable, value); (line 11 in the example/librarycall/TestContract.sol file) would have the following parse tree

In AST1 this would parsed as:

ExprStmt(
    expr=BinaryOp(
        left=Ident(text='myVariable'),
        right=CallFunction(
            callee=GetMember(
                obj_base=Ident(text='adder'),
                name=Ident(text='add')
            ),
            modifiers=[],
            args=[Ident(text='myVariable'), Ident(text='value')]
        ),
        op=<BinaryOpCode.ASSIGN: '='>
    )
)

There are many things left to be desired but here are some the most obvious:

  1. The store operation is a BinaryOp instead of a state variable store
  2. The callee for the library call is a GetMember consisting of Idents only. Without the import information in the current scope, we can't resolve this call.
  3. Similarly, the arguments are Idents and represent state variable lookup and local variable lookup respectively. We can't poll any information from this parse tree on its own because these Idents aren't bound to anything.

Here is the AST2 output of the same code:

StateVarStore(
        base=SelfObject(),
        name=Ident(text='myVariable'),
        value=FunctionCall(
            named_args=[],
            args=[
                StateVarLoad(base=SelfObject(), name=Ident(text='myVariable')),
                LocalVarLoad(
                    var=Var(
                        name=Ident(text='value'),
                        ttype=IntType(is_signed=False, size=256),
                        location=None
                    )
                )
            ],
            base=StateVarLoad(base=SelfObject(), name=Ident(text='adder')),
            name=Ident(text='add')
        )
)

The tree is much clearer and explicit in the operations performed. Additional functionality is also available due to the linking, for example, calling FunctionCall.resolve_call() gives us the corresponding FunctionDefinition in the AdderLib.Adder library, base.type_of() gives us a ResolvedUserType(MyContract) which can then be explored like in the quickstart.py example.