UKGovernmentBEIS inspect_ai issues

UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations

https://inspect.ai-safety-institute.org.uk/

MIT License

627 stars 118 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Bump ruff from 0.7.4 to 0.8.0 in the python-packages group

#884 dependabot[bot] opened 6 hours ago
0
Use view in message tool rendering

#883 dragonstyle closed 5 hours ago
0
don't use thread pool executor for read_eval_log_headers

#882 jjallaire closed 15 hours ago
0
Display sample ids rather than sample numbers

#881 dragonstyle closed 16 hours ago
0
fix for eval-set in fullscreen mode

#880 jjallaire closed 18 hours ago
0
Add support for rendering images in metadata

#879 dragonstyle closed 18 hours ago
0
`ContentImage.detail` is not passed down to the `OpenAIAPI`.

#878 tobiasraabe opened 21 hours ago
0
Add max sample error

#877 dragonstyle closed 16 hours ago
2
resolve tmux issues in fullscreen display

#876 jjallaire closed 23 hours ago
0
Revert "fullscreen/realtime interface for samples (#865)"

#875 jjallaire closed 1 day ago
0
Make bash a login shell

#874 art-dsit closed 5 hours ago
1
Minimal tests for bash and python tools

#873 art-dsit closed 18 hours ago
0
location property for eval logs

#872 jjallaire closed 1 day ago
0
Fix single sample display padding

#871 dragonstyle closed 1 day ago
0
fix issue w/ viewer limits

#870 jjallaire closed 1 day ago
0
Feature/timestamps

#869 dragonstyle closed 1 day ago
0
Add more documentation on CLI

#868 mrahtz opened 2 days ago
1
Typo on 'welcome' / 'getting started' page.

#867 williamgki opened 2 days ago
0
Sample Limit Improvements

#866 dragonstyle closed 1 day ago
0
fullscreen/realtime interface for samples

#865 jjallaire closed 1 day ago
1
[Feature Request] Add user_message solver

#864 kaifronsdal opened 2 days ago
1
consistent behavior for `max_samples` across sandbox and non-sandbox evals

#863 jjallaire closed 3 days ago
0
Treat dictionary keys as globs

#862 dragonstyle closed 3 days ago
0
Use ‘View’ when rendering tool events

#861 dragonstyle closed 3 days ago
1
fix: remove upper limit for max_connections for Groq

#860 AarushSah closed 3 days ago
0
update to cross-spawn 7.0.5

#859 jjallaire closed 4 days ago
0
Add a test for overwriting a file with write_file

#858 art-dsit closed 4 days ago
0
add optional title field to tool view content

#857 jjallaire closed 4 days ago
0
gemini: combine consecutive tool messages into single content part; ensure no empty text content parts

#856 jjallaire closed 4 days ago
0
Bugfix/rendering

#855 dragonstyle closed 4 days ago
1
Small typo

#854 lukaspetersson closed 4 days ago
0
Convert Bedrock to Converse API

#853 dragonstyle closed 4 days ago
0
Properly Support Multi Tool Call Output

#852 dragonstyle closed 6 days ago
0
Bump ruff from 0.7.3 to 0.7.4 in the python-packages group

#851 dependabot[bot] closed 6 days ago
0
basic_agent: have incorrect_message function take list of scores

#850 jjallaire-aisi closed 1 week ago
0
Inspect sometimes hangs on extremely large runs

#849 MSchmatzAISI opened 1 week ago
0
`normalize_number` fails on exponent / fractional input

#848 evanmiller-anthropic opened 1 week ago
0
Is there a way to have max-connections go beyond 100?

#847 AarushSah closed 1 week ago
1
Support custom incorrect messages in basic_agent

#846 samiranne closed 1 week ago
2
Fix to allow repeat evals on azureai

#845 ole-jorgensen closed 1 week ago
0
Fix bug with view scroll

#844 dragonstyle closed 1 week ago
0
Move INSPECT_DISABLE_MODEL_API check

#843 rusheb closed 1 week ago
1
Dataclass Objects in Score Metadata Not Preserved in Binary Log Format

#842 rusheb closed 1 week ago
3
Fix scoring behaviour in basic agent

#841 sdtblckgov closed 1 week ago
1
Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses

#840 max-kaufmann opened 1 week ago
0
Log View: Restore Scroll Position in VS Code

#839 dragonstyle closed 1 week ago
0
`match` scorer doesn't handle answers with percent sign

#838 fastfedora opened 1 week ago
1
Disabling exponential backoff

#837 AarushSah closed 1 week ago
2
Inspect process uses large amounts of memory w/ DockerSandbox, likely because large exec() outputs are loaded into memory

#836 max-kaufmann opened 1 week ago
0
[Question] Easy way to use finetuned OpenAI models?

#835 baceolus closed 1 week ago
1