enyst / playground

MIT License
0 stars 1 forks source link

Integration tests runs #9

Open enyst opened 4 days ago

enyst commented 4 days ago

Summary

This is a placeholder issue for nightly integration tests results.

github-actions[bot] commented 3 days ago
Trigger by: Manual Trigger: test without PR Commit: 6ddae01acb9cee32527f2579eb24d05742d76254 Integration Tests Report (Haiku) Haiku LLM Test Results: Success rate: 100.00% (6/6) instance_id success reason
t01_fix_simple_typo True
t02_add_bash_hello True
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: Success rate: 66.67% (4/6) instance_id success reason
t05_simple_browsing True
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.
t04_git_staging False Failed to check for "nothing to commit, working tree clean": On branch initial-commit
Untracked files:
(use "git add ..." to include in what will be committed)
push.log
nothing added to commit but untracked files present (use "git add" to track).

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

github-actions[bot] commented 3 days ago
Trigger by: Nightly Scheduled Run Commit: 6ddae01acb9cee32527f2579eb24d05742d76254 Integration Tests Report (Haiku) Haiku LLM Test Results: Success rate: 100.00% (6/6) instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: Success rate: 83.33% (5/6) instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

github-actions[bot] commented 2 days ago
Trigger by: Nightly Scheduled Run Commit: 71fc115c344abbd38d7fe21a2a5c8119cfb3a1c6 Integration Tests Report (Haiku) Haiku LLM Test Results: Success rate: 100.00% (6/6) instance_id success reason
t02_add_bash_hello True
t03_jupyter_write_file True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: Success rate: 83.33% (5/6) instance_id success reason
t05_simple_browsing True
t03_jupyter_write_file True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

github-actions[bot] commented 1 day ago
Trigger by: Nightly Scheduled Run Commit: 91135d7e76168506545861c2b0b7faae41e25e3f Integration Tests Report (Haiku) Haiku LLM Test Results: Success rate: 100.00% (6/6) instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: Success rate: 83.33% (5/6) instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download