RhodiumGroup / rhg_compute_tools

Tools for using compute.rhg.com and compute.impactlab.org
MIT License
1 stars 4 forks source link

add globals blocker #62

Closed delgadom closed 4 years ago

delgadom commented 4 years ago

Imagine there's no globals No longer hard to dooooooo Every object is local Whitelisting's up to youuuuuuuu

import dis
import toolz
import inspect
import functools
import types
import collections.abc

_functions_and_modules = (
    types.FunctionType,
    types.ModuleType,
    types.MethodType,
    types.BuiltinMethodType,
    types.BuiltinFunctionType
)

@toolz.functoolz.curry
def block_globals(obj, allowed_types=None, include_defaults=True, whitelist=None):
    """
    Decorator to prevent the use of undefined closures and globals in functions and classes

    Parameters
    ----------
    func : function
        Function to decorate. All globals not matching one of the allowed
        types will raise an AssertionError
    allowed_types : type or tuple of types, optional
        Types which are allowed as globals. By default, functions and
        modules are allowed. The full set of allowed types is drawn from
        the ``types`` module, and includes :py:class:`~types.FunctionType`, 
        :py:class:`~types.ModuleType`, :py:class:`~types.MethodType`, 
        :py:class:`~types.BuiltinMethodType`, and
        :py:class:`~types.BuiltinFunctionType`.
    include_defaults : bool, optional
        If allowed_types is provided, setting ``include_defaults`` to True will
        append the default list of functions, modules, and methods to the
        user-passed list of allowed types. Default is True, in which case
        only the user-passed elements will be allowed. Setting to False will
        allow only the types passed in ``allowed_types``.
    whitelist : list of str, optional
        Optional list of variable names to whitelist. If a list is provided,
        global variables will be compared to elements of this list based on
        their string names. Default (None) is no whitelist.

    Examples
    --------

    Wrap a function to block globals:

    .. code-block:: python

        >>> my_data = 10

        >>> @block_globals
        ... def add_5(data):
        ...     ''' can you spot the global? '''
        ...     a_number = 5
        ...     result = a_number + my_data
        ...     return result  # doctest: +ELLIPSIS
        ...
        Traceback (most recent call last)
        ...
        TypeError: Illegal <class 'int'> global found in add_5: my_data

    Wrapping a class will prevent globals from being used in all methods:

    .. code-block:: python

        >>> @block_globals
        ... class MyClass:
        ...
        ...     @staticmethod
        ...     def add_5(data):
        ...         ''' can you spot the global? '''
        ...         a_number = 5
        ...         result = a_number + my_data
        ...         return result  # doctest: +ELLIPSIS
        ...
        Traceback (most recent call last)
        ...
        TypeError: Illegal <class 'int'> global found in add_5: my_data

    By default, functions and modules are allowed in the list of globals. You
    can modify this list with the ``allowed_types`` argument:

    .. code-block:: python

        >>> result_formatter = 'my number is {}'
        >>> @block_globals(allowed_types=str)
        ... def add_5(data):
        ...     ''' only allowed globals here! '''
        ...     a_number = 5
        ...     result = a_number + data
        ...     return result_formatter.format(result)
        ...
        >>> add_5(3)
        my number is 8

    block_globals will also catch undefined references:

    .. code-block:: python

        >>> @block_globals
        ... def get_mean(df):
        ...     return da.mean()  # doctest: +ELLIPSIS
        Traceback (most recent call last):
        ...
        TypeError: Undefined global in get_mean: da
    """

    if allowed_types is None:
        allowed_types = _functions_and_modules

    if (allowed_types is not None) and include_defaults:
        if not isinstance(allowed_types, collections.abc.Sequence):
            allowed_types = [allowed_types]

        allowed_types = tuple(list(allowed_types) + list(_functions_and_modules))

    if whitelist is None:
        whitelist = []

    if isinstance(obj, type):
        for attr in obj.__dict__:
            if callable(getattr(obj, attr)):
                setattr(obj, attr, block_globals(getattr(obj, attr)))
        return obj

    closurevars = inspect.getclosurevars(obj)
    for instr in dis.get_instructions(obj):
        if instr.opname == 'LOAD_GLOBAL':
            if instr.argval in closurevars.globals:
                if instr.argval in whitelist:
                    continue
                g = closurevars.globals[instr.argval]
                if not isinstance(g, allowed_types):
                    raise TypeError('Illegal {} global found in {}: {}'.format(type(g), obj.__name__, instr.argval))
            else:
                raise TypeError('Undefined global in {}: {}'.format(obj.__name__, instr.argval))

    @functools.wraps(obj)
    def inner(*args, **kwargs):
        return obj(*args, **kwargs)

    return inner
delgadom commented 4 years ago

Note that this does not catch undeclared references, so @block_globals would not raise an exception in this case:

In [2]: @block_globals
   ...: def myfunc(num):
   ...:     return number + 5
   ...: 
In [3]: number = 5
In [4]: myfunc(1)
10

note that this makes use of the global number, which snuck into the function as an undeclared reference until after myfunc's definition.

delgadom commented 4 years ago

Whoa. Was trying to debug some weird behavior and found out that inspect.getclosurevars does not handle attributes correctly:

In [1]: import inspect

In [2]: def get_persons_name(person):
   ...:     return person.name
   ...:

In [3]: inspect.getclosurevars(get_persons_name)
Out[3]: ClosureVars(nonlocals={}, globals={}, builtins={}, unbound={'name'})

In [4]: age = 4
   ...: 
   ...: def get_persons_age(person):
   ...:     return person.age
   ...:

In [5]: inspect.getclosurevars(get_persons_age)
Out[5]: ClosureVars(nonlocals={}, globals={'age': 4}, builtins={}, unbound=set())

This is a deal breaker for this function, which should prevent unbound variables and should not prevent attributes!

May have to use dis.get_instructions to differentiate globals from locals. Might get gnarly.

delgadom commented 4 years ago

The above decorator has been updated to fix this behavior