UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
597 stars 108 forks source link

[Feature Request] Support desktop agents #290

Closed jmsdao closed 1 week ago

jmsdao commented 2 months ago

Executive summary

Inspect currently does not support desktop agents.

1) Many threat model relevant tasks require a GUI OS to complete 2) Benchmarks for desktop agents already exist 3) Desktop agents are a realistic threat model 4) Desktop agents require a virtual machine (ie. VirtualBox)

Quick example of desktop agents: see the video + graphical abstract of OSWorld.

Main arguments

1) Many relevant tasks require a GUI OS

2) Benchmarks for desktop agents already exist

3) Desktop agents are a realistic threat model

4) Desktop agents require a virtual machine

What's missing from Inspect

Next steps

My team and I could potentially implement this feature if there's interest and excitement to do so! We’re excited to see evals go in the direction of desktop environments as we think this is highly consequential for the field of AI safety.

jjallaire commented 1 week ago

@jmsdao As we've discussed we aren't going to make this a full on Inspect feature at this point. That said, I know you are working on an extension and it might be nice to post a link to that work here for those reading this issue.

jmsdao commented 1 week ago

I'll be sure to do so once it's ready!