expectedparrot / edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
https://docs.expectedparrot.com
MIT License
163 stars 18 forks source link

Decide if safe way to serialize QuestionFunctional #429

Closed johnjosephhorton closed 1 month ago

johnjosephhorton commented 3 months ago

QuestionFunctional takes a func parameter that is a python function. Obviously, we cannot just eval this back to life from serialized form. But perhaps with enough safety checks, we could serialize a function if we were sure it was purely functional w/ no side effects.

@zer0dss this is more your lane - any ideas? Is there some safe way for us to do this?

E.g., here's a ChatGPT idea

import ast

def is_safe_to_eval(func):
    def visit_node(node):
        # Allow only the following safe node types
        safe_nodes = (
            ast.FunctionDef,
            ast.arguments,
            ast.arg,
            ast.Load,
            ast.Expr,
            ast.BinOp,
            ast.UnaryOp,
            ast.Num,
            ast.Str,
            ast.Name,
            ast.Call,
            ast.keyword,
            ast.List,
            ast.Tuple,
            ast.Set,
            ast.Dict,
            ast.Compare,
            ast.Attribute,
            ast.Index,
            ast.Slice,
            ast.Subscript,
            ast.Lambda,
            ast.Return,
            ast.Pass,
        )
        for child in ast.iter_child_nodes(node):
            if isinstance(child, ast.Assign) or isinstance(child, ast.AugAssign):
                return False
            if not isinstance(child, safe_nodes):
                return False
            if not visit_node(child):
                return False
        return True

    source_code = inspect.getsource(func)
    parsed_ast = ast.parse(source_code)
    return visit_node(parsed_ast)
zer0dss commented 3 months ago

@johnjosephhorton A solution that I've seen in CTF challenges is RestrictedPython. It blocks all the dangerous methods that could lead to a server compromise.

Observation: By default RestrictedPython blocks for loops too because it can create infinite code logic that consumes the CPU. If we want to allow the user to use for loops then we can implement a protection by setting a limit execution time for the method.

johnjosephhorton commented 3 months ago

Interesting - is the loop execution restriction part of RestrictedPython or would we have to add that ourselves?

zer0dss commented 3 months ago

We have to run the code in a thread with execution time limit. This is the solution that we have to add. RestrictedPyhon doesn't provide this by default and just suggests that you have to choose your own strategy when allowing for loops.