UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
565 stars 96 forks source link

Support for evals within a module #48

Closed adrianlyjak closed 3 weeks ago

adrianlyjak commented 3 months ago

I like to write my applications as modules, and I like to place my tests inside my modules so that I can easily import code (and not have to have the module on the path). Lots of test frameworks support this, however if I run run inspect on the files, I'll get errors about the module imports not being find (whether using absolute or relative imports)

For example

app/
    prompts/foo.py
    evals/eval_foo.py
    __init__.py

eval_foo.py

from app.prompts.foo import prompt # ModuleNotFoundError: No module named 'app'
from ..prompts.foo import prompt # ImportError: attempted relative import beyond top-level package

I can somewhat workaround this by adding to the python path:

PYTHONPATH=./ inspect eval app/evals/eval_foo.py

In this case absolute imports now work.

Another workaround for absolute imports is to add to the python path in the eval file

import os
import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from app.prompts.foo import prompt

However it would be nice if inspect could support modules natively. I've come to expect this behavior from other python testing frameworks

aisi-inspect commented 3 months ago

Thanks for this! We are attempting to do something along the lines of what you are suggesting but apparently all the dots aren't connected. We have a chdir_python() context manager that we use when loading and running tasks, e.g.:

https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/src/inspect_ai/_eval/task/run.py#L98

This is turn adds to the sys.path:

https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/src/inspect_ai/_util/path.py#L40

Does anything about our scheme immediately strike you as deficient or otherwise not aligning with your scenario?

adrianlyjak commented 3 months ago

I think that should work?--I'm not an expert on this. I'd be curious to verify whether relative imports work.

As far as the logic to change directory, it seems like its happening too late, after the "task" object has been created from loading up the source code. Full stack trace from this reproduction https://github.com/adrianlyjak/inspect-ai-48-repro below. The failure happens within resolve_tasks here when calling exec_module

https://github.com/UKGovernmentBEIS/inspect_ai/blob/bca3ac87f47882bab71b970541af3f0a71112aa9/src/inspect_ai/_eval/loader.py#L188

Traceback (most recent call last):
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/bin/inspect", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_cli/main.py", line 47, in main
    inspect(auto_envvar_prefix="INSPECT")
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_cli/common.py", line 30, in wrapper
    return cast(click.Context, func(*args, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_cli/common.py", line 55, in wrapper
    return cast(click.Context, func(*args, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_cli/eval.py", line 238, in eval_command
    eval(
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/eval.py", line 102, in eval
    return asyncio.run(
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 188, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 120, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 650, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/eval.py", line 217, in eval_async
    eval_tasks = resolve_tasks(tasks, model, task_args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 57, in resolve_tasks
    return load_tasks(cast(list[str] | None, tasks), model, task_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 67, in load_tasks
    return [
           ^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 70, in <listcomp>
    for spec in load_task_spec(task_spec, model_name, task_args)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 83, in load_task_spec
    return create_tasks([task_spec], model, task_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 115, in create_tasks
    tasks.extend(create_file_tasks(file, model, None, task_args))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 134, in create_file_tasks
    task_specs = _load_task_specs(file)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 158, in _load_task_specs
    module = load_task_module(task_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/.venv/lib/python3.11/site-packages/inspect_ai/_eval/loader.py", line 188, in load_task_module
    loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/adrianlyjak/dev/inspect-ai-48-repro/app/evals/eval_foo.py", line 4, in <module>
    from app.prompts.foo import GREET
ModuleNotFoundError: No module named 'app'
aisi-inspect commented 3 months ago

Thanks for the repro!

We actually also change the directory during task loading (if you chase up the call stack you will see this).

I think the problem comes from the fact that you app is in app/evals and is attempting to import from the top of app (but we've added evals to the sys.path rather than app/evals. I'm not sure if there is a general way to determine what the "right" directory to add here is.

So it looks like we are very much assuming that the evals are at the "top" of the module structure (i.e. helpers would be in sub-directories off of where the eval is). I'm again not sure what a general resolution of this would look like (very open to suggestions!)

adrianlyjak commented 3 months ago

ah, I see. I have no immediate ideas (I haven't looked into importlib much before), but I can poke around pytest source code and see if there's something to learn from that.

If there's something useful there, its probably around here:

https://github.com/pytest-dev/pytest/blob/5037f8d1145b87b0008902487ace44653692374b/src/_pytest/assertion/rewrite.py#L98

adrianlyjak commented 3 months ago

actually, maybe more like around here:

https://github.com/pytest-dev/pytest/blob/5037f8d1145b87b0008902487ace44653692374b/src/_pytest/pathlib.py#L619

and pytest documentation on behavior https://github.com/pytest-dev/pytest/blob/5037f8d1145b87b0008902487ace44653692374b/doc/en/explanation/pythonpath.rst#id10 😵‍💫

aisi-inspect commented 3 months ago

We aren't directly implementing import (and we don't have a notion of what the "root" directory scope is) so it might be hard to emulate pytest exactly. One thing we could do is search up the directory hierarchy looking for an __init__.py and add that to the sys.path. Do you think that would work?

adrianlyjak commented 3 months ago

@aisi-inspect I think that seems like the most straightforward method to get the full path (non relative) module imports working with in-module . Were you thinking something like, find the outermost folder that contains an __init__.py, and add it's parent directory to the sys.path? There's an unfortunate caveat where if you have your folder structured with your evals are in a sibling directory to your sources that the module imports wouldn't work, which also seems like its another test folder structure in python.

This kind of structure:

root/
    evals/
        eval_baz.py # has something like `from app.foo.baz import something`
    app/
        __init__.py
        foo/baz.py

In that case, it seems like there'd need to be some sort of a concept of a "root" directory to handle more flexible test structures. I don't know if that kind of flexibility is a goal of this project. I think it seems perfectly reasonable to be opinionated and just document that evals must be within the module to be automatically handled.

rusheb commented 3 weeks ago

Hello hello, I've done a quick investigation of this. Please correct me if I'm wrong, but I'm under the impression that this is not an issue with inspect but a symptom of the fact that you haven't set up app as an installable package.

The first thing I tried was to remove all of the inspect-related code. You can see that on my branch rs-01-remove-inspect. I keep your folder structure but simply define a python function greet() which attempts to import the GREET variable from app. It fails with the same error you provide in the README.

Second, in branch rs-02-install-package I add to your original repo a pyproject.toml file which sets up app as an installable package. The only change to the setup is to run pip install -e . rather than pip install -r requirements.txt. With those changes it seems like inspect can import from app.

Finally in branch rs-03-sibling I tested this with the sibling package structure which you mention in your previous comment. I tried this with a fresh venv and it seems to work just fine as well.

@adrianlyjak do you think this resolves your issue, or is there something I've got wrong here or a reason you wouldn't want to set up app as a package?

adrianlyjak commented 3 weeks ago

Thanks @rusheb , I seem to have overlooked that! That's definitely a reasonable solution.