Closed jmsdao closed 1 month ago
@jmsdao As we've discussed we aren't going to make this a full on Inspect feature at this point. That said, I know you are working on an extension and it might be nice to post a link to that work here for those reading this issue.
I'll be sure to do so once it's ready!
Executive summary
Inspect currently does not support desktop agents.
1) Many threat model relevant tasks require a GUI OS to complete 2) Benchmarks for desktop agents already exist 3) Desktop agents are a realistic threat model 4) Desktop agents require a virtual machine (ie. VirtualBox)
Quick example of desktop agents: see the video + graphical abstract of OSWorld.
Main arguments
1) Many relevant tasks require a GUI OS
2) Benchmarks for desktop agents already exist
3) Desktop agents are a realistic threat model
4) Desktop agents require a virtual machine
What's missing from Inspect
Next steps
My team and I could potentially implement this feature if there's interest and excitement to do so! We’re excited to see evals go in the direction of desktop environments as we think this is highly consequential for the field of AI safety.