Adamliu1 / SNLP_GCW

3 stars 0 forks source link

Collect a list of all datasets used #90

Open Willmish opened 1 month ago

Willmish commented 1 month ago

Updated 2024-07-01.

Datasets:

TheRootOf3 commented 4 weeks ago

Sent to Eduardo on 04-06-2024!

TheRootOf3 commented 2 days ago

Updates on 2024-07-01: Logical reasoning: Replaced sail/symbolic-instruction-tuning with mnli because of random guess performance on the former.

GSM8K: Evaluation takes very long time due to its "generative" nature and large dataset size -- potentially to be dropped.

Retain set: Changed to squad, while evaluation is performed on squadv2 (which is essentially squad + some "tricky" question without answers).

@TheRootOf3 Reconsider adding toxigen as well as real toxicity prompts.