JHU-CLSP / turking-bench

Web-grounded natural language instructions
https://turkingbench.github.io
Apache License 2.0
13 stars 6 forks source link

Smarter GitHub Actions Instance Even Distribution | Skipped Rationale Generation 5 and Fixed Abductive Reasoning 11 Removing Input Sorting, Fixed DI Rationale Gen. evaluation - single 2 #87

Closed klxu03 closed 1 year ago

klxu03 commented 1 year ago

Applied a greedy algorithm to split_tasks for TAP tests into 20 more evenly distribute github action instances. Noticed some completing in 30 mins and others taking 5 hours. This should make it more even.

Fixed Abductive Reasoning 11 by getting rid of sort input_fields by X coords then Y coords in extract_input_values_from_url. I think we should establish a new convention where the order of filling in is specified by the order of Answer columns, lefter ones going first and we manually specify for more fine grained control over each task. (This may cause many other tasks to fail, in which we should properly manually re-order right now)

Likewise fixed DI Rationale Gen. evaluation - single 2 same way, rearranging column order (there may be other tasks like this that my previous fix broke). This is still flaky, going to wait for TAP tests to give me an index of error

Skipping the Rationale Generation 5 task. you need to click "show questions" before the questions show up for u to solve. We can manually code this action in for this task if we want, or keep it skipped

@danyaljj @yeganehkordi

danyaljj commented 1 year ago

One minor comment. Otherwise, good to go.

klxu03 commented 1 year ago

One minor comment. Otherwise, good to go.

@danyaljj was this the update you wanted?