issues
search
JHU-CLSP
/
turking-bench
Web-grounded natural language instructions
https://turkingbench.github.io
Apache License 2.0
13
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update README.md
#134
adiasija2011
closed
1 month ago
0
Update README.md
#133
adiasija2011
closed
1 month ago
0
Adding the missing tasks
#132
danyaljj
closed
3 months ago
0
requirements.txt does not include packages needed to run ./1_run_website.sh
#131
devinat1
closed
7 months ago
12
improve the readme and change the task names for better readability
#130
danyaljj
closed
8 months ago
0
Getting the setup working on IA1 with a remote server and bringing in Text models
#129
klxu03
closed
8 months ago
1
Few shot examples + OLlama VLM Integration
#128
klxu03
closed
9 months ago
0
Collecting field statistics
#127
klxu03
closed
9 months ago
0
GPT4TextVision Baseline Works and Beginning of General Mouse/Keyboard controls on Vision Models Skeleton
#126
klxu03
closed
10 months ago
1
Minor chnages to the test tasks.
#125
danyaljj
closed
10 months ago
0
Fixed two CSV tasks
#124
klxu03
closed
10 months ago
1
Lots of data cleaning following the addition of asserting every row instance sees an answer
#123
klxu03
closed
10 months ago
0
Data cleaning
#122
klxu03
closed
10 months ago
0
Fix run single
#121
klxu03
closed
11 months ago
1
Fixed dumped data format
#120
klxu03
closed
11 months ago
0
GPT4 text solver.
#119
danyaljj
closed
10 months ago
0
Evaluate model given dumped data
#118
klxu03
closed
1 year ago
2
add examples
#117
yeganehkordi
closed
1 year ago
0
Converted actions to all be based on input_name instead of input, and execute model_outputs from the folder in src/
#116
klxu03
closed
1 year ago
1
Make actions input_name string based instead of input based
#115
klxu03
closed
1 year ago
0
Update actions.py
#114
danyaljj
closed
1 year ago
0
Parallel dumping
#113
danyaljj
closed
10 months ago
0
Laundry List of Updates | hotfixing Slurm script and our code to make it Slurm compatible
#112
klxu03
closed
1 year ago
0
dump features partitions rockfish prep and more
#111
klxu03
closed
1 year ago
0
Dump relevant HTML
#110
klxu03
closed
1 year ago
1
Deleting easy evaluation tasks that oracle cannot solve
#109
klxu03
closed
1 year ago
0
updating reference eval files
#108
klxu03
closed
1 year ago
0
References to the eval files in the code
#107
danyaljj
closed
1 year ago
0
Updated the evaluation split to also include hard to do tasks
#106
klxu03
closed
1 year ago
0
Fix the runtime error for dumpring data
#105
yeganehkordi
closed
1 year ago
0
Architectural refactors to pave the way for ModelBaseline to evaluate a model's generated outputs
#104
klxu03
closed
1 year ago
1
Added get_relevant_html helper function in eval for dump_features
#103
klxu03
closed
1 year ago
0
Added comprehensive random TAP tests
#102
klxu03
closed
1 year ago
0
Finished fixing rest of flaky tasks
#101
Gosheni
closed
1 year ago
0
Fixed JiminyCricket-HumanVal-b10 rows
#100
Gosheni
closed
1 year ago
0
Fix dumping html files
#99
yeganehkordi
closed
1 year ago
0
wiki103_quality 7 Fixed remaining rows - corrected invalid answers
#98
Gosheni
closed
1 year ago
0
wiki103_quality 7 Corrected invalid answers
#97
Gosheni
closed
1 year ago
0
enumerate_tasks use filter_tap_tasks so it actually runs all my fixed tasks
#96
klxu03
closed
1 year ago
0
Fixed Congressional Bills 5, Annotation subj_obj, HTER - 27 Sep 1859, Reddit In-group Analysis Comment annotation 3, mars human eval (a-b testing) 3, wikiHow Goal Membership
#95
klxu03
closed
1 year ago
0
Fixed: ATOMIC - Required Objects 5, ROT Details [m=50] rocstories - 0 - 99,
#94
klxu03
closed
1 year ago
0
hotfix allow users to run Turkle server with 1_run script after initial clone
#93
klxu03
closed
1 year ago
0
Adding the missing details from the readme.
#92
danyaljj
closed
1 year ago
0
Fixed: wiki103, Word Formality, wikiHow step-goal, Style adaptation
#91
klxu03
closed
1 year ago
1
Fixed Arch, Dialogue Safety 5, wiki103, Ethnologue, Step 2, Word Formality, and more
#90
klxu03
closed
1 year ago
0
Fixed Arch
#89
klxu03
closed
1 year ago
0
Fixed tasks: Step 5 human performance, Step 2 Verifying Multi-sentence-ness, wikiHow Step Membership, Commonsense Misinformation Tracking Pilot [cancer data setup] 10, and more and deleting TAP tests duplicates
#88
klxu03
closed
1 year ago
0
Smarter GitHub Actions Instance Even Distribution | Skipped Rationale Generation 5 and Fixed Abductive Reasoning 11 Removing Input Sorting, Fixed DI Rationale Gen. evaluation - single 2
#87
klxu03
closed
1 year ago
2
Fixed Opinion Mining of Spanish Customer Comments HIT2 tasks, run_single QOL, Fixed Style Adaptaion - Subjective-Objective, Fixed intuitive physics 01
#86
klxu03
closed
1 year ago
1
Made headless a parameter and allowed run_single to specify a row index in task
#85
klxu03
closed
1 year ago
0
Next