JHU-CLSP turking-bench issues

JHU-CLSP / turking-bench

Web-grounded natural language instructions

https://turkingbench.github.io

Apache License 2.0

13 stars 6 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update README.md

#134 adiasija2011 closed 1 month ago
0
Update README.md

#133 adiasija2011 closed 1 month ago
0
Adding the missing tasks

#132 danyaljj closed 3 months ago
0
requirements.txt does not include packages needed to run ./1_run_website.sh

#131 devinat1 closed 7 months ago
12
improve the readme and change the task names for better readability

#130 danyaljj closed 8 months ago
0
Getting the setup working on IA1 with a remote server and bringing in Text models

#129 klxu03 closed 8 months ago
1
Few shot examples + OLlama VLM Integration

#128 klxu03 closed 9 months ago
0
Collecting field statistics

#127 klxu03 closed 9 months ago
0
GPT4TextVision Baseline Works and Beginning of General Mouse/Keyboard controls on Vision Models Skeleton

#126 klxu03 closed 10 months ago
1
Minor chnages to the test tasks.

#125 danyaljj closed 10 months ago
0
Fixed two CSV tasks

#124 klxu03 closed 10 months ago
1
Lots of data cleaning following the addition of asserting every row instance sees an answer

#123 klxu03 closed 10 months ago
0
Data cleaning

#122 klxu03 closed 10 months ago
0
Fix run single

#121 klxu03 closed 11 months ago
1
Fixed dumped data format

#120 klxu03 closed 11 months ago
0
GPT4 text solver.

#119 danyaljj closed 10 months ago
0
Evaluate model given dumped data

#118 klxu03 closed 1 year ago
2
add examples

#117 yeganehkordi closed 1 year ago
0
Converted actions to all be based on input_name instead of input, and execute model_outputs from the folder in src/

#116 klxu03 closed 1 year ago
1
Make actions input_name string based instead of input based

#115 klxu03 closed 1 year ago
0
Update actions.py

#114 danyaljj closed 1 year ago
0
Parallel dumping

#113 danyaljj closed 10 months ago
0
Laundry List of Updates | hotfixing Slurm script and our code to make it Slurm compatible

#112 klxu03 closed 1 year ago
0
dump features partitions rockfish prep and more

#111 klxu03 closed 1 year ago
0
Dump relevant HTML

#110 klxu03 closed 1 year ago
1
Deleting easy evaluation tasks that oracle cannot solve

#109 klxu03 closed 1 year ago
0
updating reference eval files

#108 klxu03 closed 1 year ago
0
References to the eval files in the code

#107 danyaljj closed 1 year ago
0
Updated the evaluation split to also include hard to do tasks

#106 klxu03 closed 1 year ago
0
Fix the runtime error for dumpring data

#105 yeganehkordi closed 1 year ago
0
Architectural refactors to pave the way for ModelBaseline to evaluate a model's generated outputs

#104 klxu03 closed 1 year ago
1
Added get_relevant_html helper function in eval for dump_features

#103 klxu03 closed 1 year ago
0
Added comprehensive random TAP tests

#102 klxu03 closed 1 year ago
0
Finished fixing rest of flaky tasks

#101 Gosheni closed 1 year ago
0
Fixed JiminyCricket-HumanVal-b10 rows

#100 Gosheni closed 1 year ago
0
Fix dumping html files

#99 yeganehkordi closed 1 year ago
0
wiki103_quality 7 Fixed remaining rows - corrected invalid answers

#98 Gosheni closed 1 year ago
0
wiki103_quality 7 Corrected invalid answers

#97 Gosheni closed 1 year ago
0
enumerate_tasks use filter_tap_tasks so it actually runs all my fixed tasks

#96 klxu03 closed 1 year ago
0
Fixed Congressional Bills 5, Annotation subj_obj, HTER - 27 Sep 1859, Reddit In-group Analysis Comment annotation 3, mars human eval (a-b testing) 3, wikiHow Goal Membership

#95 klxu03 closed 1 year ago
0
Fixed: ATOMIC - Required Objects 5, ROT Details [m=50] rocstories - 0 - 99,

#94 klxu03 closed 1 year ago
0
hotfix allow users to run Turkle server with 1_run script after initial clone

#93 klxu03 closed 1 year ago
0
Adding the missing details from the readme.

#92 danyaljj closed 1 year ago
0
Fixed: wiki103, Word Formality, wikiHow step-goal, Style adaptation

#91 klxu03 closed 1 year ago
1
Fixed Arch, Dialogue Safety 5, wiki103, Ethnologue, Step 2, Word Formality, and more

#90 klxu03 closed 1 year ago
0
Fixed Arch

#89 klxu03 closed 1 year ago
0
Fixed tasks: Step 5 human performance, Step 2 Verifying Multi-sentence-ness, wikiHow Step Membership, Commonsense Misinformation Tracking Pilot [cancer data setup] 10, and more and deleting TAP tests duplicates

#88 klxu03 closed 1 year ago
0
Smarter GitHub Actions Instance Even Distribution | Skipped Rationale Generation 5 and Fixed Abductive Reasoning 11 Removing Input Sorting, Fixed DI Rationale Gen. evaluation - single 2

#87 klxu03 closed 1 year ago
2
Fixed Opinion Mining of Spanish Customer Comments HIT2 tasks, run_single QOL, Fixed Style Adaptaion - Subjective-Objective, Fixed intuitive physics 01

#86 klxu03 closed 1 year ago
1
Made headless a parameter and allowed run_single to specify a row index in task

#85 klxu03 closed 1 year ago
0