carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.
GNU General Public License v3.0
800 stars 60 forks source link

DockerJob TTY Error #8

Open fostiropoulos opened 4 months ago

fostiropoulos commented 4 months ago

Needless to say it is an amazing work.

I tried to extend DockerJob but failed as currently it does not seem to integrate with pty.

Am I missing something or is this part of the code not tested yet?

Error:

the input device is not a TTY

I believe the input Pipe should be set to a pty?

carlini commented 4 months ago

Yeah this is definitely the flakiest part of the benchmark. It works on my linux machine, but I don't know if it will work well on other machines.

It would be great to re-work this entire piece of the code to be more ... good.