ErikBjare / gptme

LLMs in your terminal equipped with local tools: executes python and bash, edits local files, browses the web, vision. Extensible. Tiny.
http://erik.bjareholt.com/gptme/docs/
MIT License
313 stars 29 forks source link

Isolated environments for test/evals #90

Closed ErikBjare closed 1 week ago

ErikBjare commented 1 month ago

I just had an eval of Claude 3 Haiku create git repositories in ~ and ~/Programming. Not great.

Should figure out a way to constrain its impacts, either via Docker container or chroot or CI or something.

ErikBjare commented 3 weeks ago

I added a basic Dockerfile for running evals in https://github.com/ErikBjare/gptme/commit/75deabcaa6e27222d3d5713ac83c2aa70dd3aa87

ErikBjare commented 1 week ago

Fixed now that Dockerfile.eval exists.

Ideally, it should prob be changed so that each eval runs in its own container (spawned by gpt-eval). But that's for another time.