Safeguard eval - Githubissues

aroberge commented 3 years ago

Currently, in some cases (ValueError and TypeError), I use eval() to obtain an object corresponding to parts of an expression. On normal code, this should be safe since the code has already been evaluated by Python. Still, it might make sense to safeguard this further, perhaps by parsing the expresion with the ast module and not evaluating if it contains any function call. This should still make it possible to identify the object that we want to get using eval in most, if not all, cases.

alexmojaki commented 3 years ago

What made you choose to use eval() in some cases instead of pure_eval?

aroberge commented 3 years ago

If I recall correctly, eval was allowing me to give information in more cases than pure_eval. I seem to remember that, if I had a case where I forgot a comma between two lists, [1, 2][3, 4], with eval I was able to identify these two lists as separate objects, whereas pure_eval was not.

However, I might remember this incorrectly.

I initially wanted to include the possibility of identifying the objects resulting from calling functions , as I wrote in a comment:

# As a TypeError exception has been raised, Python has already evaluated
# all the relevant code parts. Thus, using eval() should be completely safe.

which is something you warned me against doing a while ago, and I am now reevaluating this (i.e. I'm thinking that you were right.)

alexmojaki commented 3 years ago

I imagine pure_eval could have managed that example since they were just literals, but it actually couldn't evaluate list displays so it would fail with [a, b][c, d]. Coincidentally, this ability was recently added, so that should work now.

In any case, there will always be many things that pure_eval can't handle. It's a question of how you weigh suboptimal messages caused by a lack of information against rare weird behaviour when eval() causes a side effect. And it may well be that eval() is worth the risk.

It might be good to list some common kinds of expression that you think pure_eval should be able to handle, and maybe I'll find the time to implement them.

aroberge commented 3 years ago

I'll try to replace my use of eval by something that I will call safe_eval, which I am guessing, as a replacement of

obj = eval(expression, frame.f_globals, frame.f_locals)

should perhaps be something like this:


def literal_expressions_grouped(self, root):
    """Find all literal in the given tree parsed by ASTTokens.
    It returns a list of pairs (tuples) containing the text of a node
    that is a literal and its actual value.
    """
    # Except for the inner function is_literal
    # this is modeled after interesting_expressions_grouped
    def is_literal(node, _ignore):
        try:
            return ast.literal_eval(node)
        except ValueError:
            pass

    return group_expressions(
        pair
        for pair in self.find_expressions(root)
        if (
            is_literal(*pair) or is_literal(*pair) in [list(), tuple(), dict(), set([])]
        )
    )

pure_eval.Evaluator.literal_expressions_grouped = literal_expressions_grouped

def safe_eval(expression, statement, frame):
    evaluator = pure_eval.Evaluator.from_frame(frame)
    atok = asttokens.ASTTokens(statement, parse=True)
    for nodes, obj in evaluator.interesting_expressions_grouped(atok.tree):
        if atok.get_text(nodes[0]) == expression:
            return obj

alexmojaki commented 3 years ago

No, this should do:

def save_eval(expr, frame):
    node = ast.parse(expr).body[0].value
    assert isinstance(node, ast.expr)
    evaluator = pure_eval.Evaluator.from_frame(frame)
    try:
        return evaluator[node]
    except pure_eval.CannotEval:
        pass  # not sure what you'd put here

aroberge commented 3 years ago

Ok, much simpler. I'll try it out later today in my code.

aroberge commented 3 years ago

Ok, I just tried to replace eval by safe_eval at one place, and have some tests failing. Using eval, I can correctly identify the object (d) and its type:

>>> d = (1,)
>>> a, b = d

Traceback (most recent call last):
  File "<friendly-console:2>", line 1, in <module>
    a, b = d
ValueError: not enough values to unpack (expected 2, got 1)
>>> why()

Unpacking is a convenient way to assign a name, to each item of an
iterable. In this instance, there are more names (2) than the length of the
iterable, a tuple of length 1.

Notice how it ends by mentioning that the iterable is a tuple.

Trying with safe_eval, here is what I get:

>>> d = (1,)
>>> a, b = d

Traceback (most recent call last):
  File "<friendly-console:2>", line 1, in <module>
    a, b = d
ValueError: not enough values to unpack (expected 2, got 1)
>>> why()

Unpacking is a convenient way to assign a name, to each item of an
iterable. In this instance, there are more names (2) than 1, the length of
the iterable.

Here, I could not identify the object, so I just used the generic message.

alexmojaki commented 3 years ago

For just a variable name there should be no problems.

    d = (1,)
    frame = inspect.currentframe()
    assert save_eval("d", frame) == d

aroberge commented 3 years ago

However, I also want to be able to identify literals, as in:

a, b = (1,)

which leads to the same error message from friendly, correctly identifying that the iterable causing the problem is a tuple.

aroberge commented 3 years ago

I might simply use a combination of:

expr in frame.f_globals() or expr in frame.f_locals()

together with using ast.literal_eval(expr)

alexmojaki commented 3 years ago

Yes, literals are fine too. It just defers to literal_eval.

assert save_eval("d", frame) == save_eval("(1,)", frame) == d

My point was that something else must be wrong in the way you're using safe_eval or the context.

aroberge commented 3 years ago

Ok, I have found some examples where (so far), I believe I must use eval() or something similar; they belong to two different "classes". I'll first consider the simplest one, with explicit integer values

>>> a = 2(3+4)

Traceback (most recent call last):
  File "<friendly-console:1>", line 1, in <module>
    a = 2(3+4)
TypeError: 'int' object is not callable

Did you mean 2 * (3+4)?

I must be able to evaluate (3+4) and find that it can be multiplied by 2 and give the right diagnostic. I would describe this class of cases as "mathematical expression"; so a math expression calculator would likely suffice.

I also use eval when trying to figure out the type of an object from an error message; something like:

obj_type = eval('str')

Instead of 'str', it could be a user-defined class.

I still think that parsing the expression with the ast module, visit the nodes and deem as unsafe any expression that has unwanted nodes (function calls, assignments, import statement, etc.) might be sufficient for my purpose.

alexmojaki commented 3 years ago

OK, I'm going to look into handling some binary operators and function calls in pure_eval.

I don't think it makes sense to use eval or similar when parsing error messages. If it's just the name of a class you're just doing a variable lookup and maybe fetching attributes. It'll fail if the class is defined somewhere else, it might even give the wrong value if multiple classes have the same name. If you can't get the type directly from executing and eval/pure_eval, I suggest recursing through object.__subclasses__().

aroberge commented 3 years ago

You had already convinced me that function calls should not be included; I'd leave them out of pure_eval.

alexmojaki commented 3 years ago

Like everything else in pure_eval it'll have to be very careful and conservative, only allowing specifically handled cases like len, int, str, str.upper, etc.

alexmojaki commented 3 years ago

pure_eval 0.2.0 is now released, with support for all binary operators and calls to most builtins, implemented in https://github.com/alexmojaki/pure_eval/pull/7

aroberge commented 3 years ago

Thanks, I'll try to have a look at it either later today or this weekend, and make sure that this becomes the minimum version required when installing friendly.

aroberge commented 3 years ago

Now using pure_eval as a safe replacement for eval().

alexmojaki commented 3 years ago

Awesome! Does it Just Work?

aroberge commented 3 years ago

It made all my existing unit tests pass. However, I am currently playing with a REPL and it does not seem to evaluate some expression that I was expecting it to, involving the len builtin which I thought was included. I'll try to play a bit more with it to find one or more good examples.

aroberge commented 3 years ago

Ok, what is "not working" is a result of my expectation not met likely as a result of my own code where I was not using eval(). In https://github.com/aroberge/friendly/blob/942218643c4e115367a5655d92cef77521b80b8c/friendly/info_variables.py#L56, I defined my own method to extract literals, which I later used on https://github.com/aroberge/friendly/blob/942218643c4e115367a5655d92cef77521b80b8c/friendly/info_variables.py#L121

I show the value of all these literals in the "where()" part of the traceback. Using pure_eval, I can see much more potentially useful information. For example, if I have:

a, b = 3, 4
# some error generated with a+b appearing on the line

I now see

a + b: 7
a: 3
b: 4

which is much better than before. However, if I have len(c) as part of the line where an error occur, with c=(0,1), I will get:

c: (0, 1)
len: <builtin function len>

and I will not see len(c): 2 which I expected. I know that pure_eval can get this value for me; it is just my old monkeypatch kludge which is the problem in this instance.

alexmojaki commented 3 years ago

Change is_literal to this:

def is_literal(node, _ignore):
    try:
        ast.literal_eval(node)
        return True
    except Exception:
        return False

I don't know if it's actually relevant but it sticks out to me, for example your code sees 0 as not a literal because it's falsy.

Also ast.literal_eval can raise many exceptions, e.g. ast.literal_eval('{{}}') raises a TypeError.

aroberge commented 3 years ago

That did not work. I will look closer at your own code and try to figure out how I might make it work.

aroberge commented 3 years ago

Ok, I now have "everything" I could have wanted. With your improved version of pure_eval, trying to find "literals" as I was doing was not the right way. Here's a partial sample output which I am happy with:

       1: a = 2, 3, 4
    -->2: b = [1, 2](9 + len(a))
              ^^^^^^^^^^^^^^^^^^

            a: (2, 3, 4)
            len: <builtin function len>
            9 + len(a): 12
            len(a): 3

Before, it would only have shown as variable informations the first two (a and len).

aroberge / friendly

Safeguard eval #188