JHU-CLSP / turking-bench

Web-grounded natural language instructions
https://turkingbench.github.io
Apache License 2.0
13 stars 6 forks source link

Architectural refactors to pave the way for ModelBaseline to evaluate a model's generated outputs #104

Closed klxu03 closed 11 months ago

klxu03 commented 1 year ago

Can now test the outputs generated from a model in src/evaluate_model.py

Changed the bounds for relevant HTML context to 15 above, 30 below. The upper and lower bounds are stored in their respective variables. Upper means lines above, lower means lines below

Fixed TAP tasks 19th partition not properly testing random baseline

Added Poetry support

klxu03 commented 1 year ago

@danyaljj