issues
search
UKGovernmentBEIS
/
inspect_ai
Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
627
stars
118
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump ruff from 0.7.4 to 0.8.0 in the python-packages group
#884
dependabot[bot]
opened
6 hours ago
0
Use view in message tool rendering
#883
dragonstyle
closed
5 hours ago
0
don't use thread pool executor for read_eval_log_headers
#882
jjallaire
closed
15 hours ago
0
Display sample ids rather than sample numbers
#881
dragonstyle
closed
16 hours ago
0
fix for eval-set in fullscreen mode
#880
jjallaire
closed
18 hours ago
0
Add support for rendering images in metadata
#879
dragonstyle
closed
18 hours ago
0
`ContentImage.detail` is not passed down to the `OpenAIAPI`.
#878
tobiasraabe
opened
21 hours ago
0
Add max sample error
#877
dragonstyle
closed
16 hours ago
2
resolve tmux issues in fullscreen display
#876
jjallaire
closed
23 hours ago
0
Revert "fullscreen/realtime interface for samples (#865)"
#875
jjallaire
closed
1 day ago
0
Make bash a login shell
#874
art-dsit
closed
5 hours ago
1
Minimal tests for bash and python tools
#873
art-dsit
closed
18 hours ago
0
location property for eval logs
#872
jjallaire
closed
1 day ago
0
Fix single sample display padding
#871
dragonstyle
closed
1 day ago
0
fix issue w/ viewer limits
#870
jjallaire
closed
1 day ago
0
Feature/timestamps
#869
dragonstyle
closed
1 day ago
0
Add more documentation on CLI
#868
mrahtz
opened
2 days ago
1
Typo on 'welcome' / 'getting started' page.
#867
williamgki
opened
2 days ago
0
Sample Limit Improvements
#866
dragonstyle
closed
1 day ago
0
fullscreen/realtime interface for samples
#865
jjallaire
closed
1 day ago
1
[Feature Request] Add user_message solver
#864
kaifronsdal
opened
2 days ago
1
consistent behavior for `max_samples` across sandbox and non-sandbox evals
#863
jjallaire
closed
3 days ago
0
Treat dictionary keys as globs
#862
dragonstyle
closed
3 days ago
0
Use ‘View’ when rendering tool events
#861
dragonstyle
closed
3 days ago
1
fix: remove upper limit for max_connections for Groq
#860
AarushSah
closed
3 days ago
0
update to cross-spawn 7.0.5
#859
jjallaire
closed
4 days ago
0
Add a test for overwriting a file with write_file
#858
art-dsit
closed
4 days ago
0
add optional title field to tool view content
#857
jjallaire
closed
4 days ago
0
gemini: combine consecutive tool messages into single content part; ensure no empty text content parts
#856
jjallaire
closed
4 days ago
0
Bugfix/rendering
#855
dragonstyle
closed
4 days ago
1
Small typo
#854
lukaspetersson
closed
4 days ago
0
Convert Bedrock to Converse API
#853
dragonstyle
closed
4 days ago
0
Properly Support Multi Tool Call Output
#852
dragonstyle
closed
6 days ago
0
Bump ruff from 0.7.3 to 0.7.4 in the python-packages group
#851
dependabot[bot]
closed
6 days ago
0
basic_agent: have incorrect_message function take list of scores
#850
jjallaire-aisi
closed
1 week ago
0
Inspect sometimes hangs on extremely large runs
#849
MSchmatzAISI
opened
1 week ago
0
`normalize_number` fails on exponent / fractional input
#848
evanmiller-anthropic
opened
1 week ago
0
Is there a way to have max-connections go beyond 100?
#847
AarushSah
closed
1 week ago
1
Support custom incorrect messages in basic_agent
#846
samiranne
closed
1 week ago
2
Fix to allow repeat evals on azureai
#845
ole-jorgensen
closed
1 week ago
0
Fix bug with view scroll
#844
dragonstyle
closed
1 week ago
0
Move INSPECT_DISABLE_MODEL_API check
#843
rusheb
closed
1 week ago
1
Dataclass Objects in Score Metadata Not Preserved in Binary Log Format
#842
rusheb
closed
1 week ago
3
Fix scoring behaviour in basic agent
#841
sdtblckgov
closed
1 week ago
1
Multi-processing w/ s3 logs, unless you use spawn to launch your subprocesses
#840
max-kaufmann
opened
1 week ago
0
Log View: Restore Scroll Position in VS Code
#839
dragonstyle
closed
1 week ago
0
`match` scorer doesn't handle answers with percent sign
#838
fastfedora
opened
1 week ago
1
Disabling exponential backoff
#837
AarushSah
closed
1 week ago
2
Inspect process uses large amounts of memory w/ DockerSandbox, likely because large exec() outputs are loaded into memory
#836
max-kaufmann
opened
1 week ago
0
[Question] Easy way to use finetuned OpenAI models?
#835
baceolus
closed
1 week ago
1
Next