Question on the source of commonsense_15k

AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

https://arxiv.org/abs/2304.01933

Apache License 2.0

1.08k stars 103 forks source link

Question on the source of commonsense_15k #69

Open clarenceluo78 opened 6 months ago

clarenceluo78 commented 6 months ago

Hi there, thanks for your work! I want to inquire about the source of the commonsense_15k dataset, as I didn't find it in the paper nor described in this repo.

HZQ950419 commented 6 months ago

Hi,

The commonsense_15k is sampled from the commonsense_170k for debugging. The results reported in the paper are based on commonsense_170k.

AaronZLT commented 5 months ago

Hi, @HZQ950419 , just curious about whether the math50k.json, contains all slices of the other mathk.json? What math dataset (math_k.json) should I use if I'm conducting finetuning :)

HZQ950419 commented 5 months ago

Hi @AaronZLT , I recommend to use math_10k, math_7, or math_14k to do fine-tuning. In order to reproduce the result in README, you need to use math_10k. math_50k is an experimental dataset for ourside, which is not recommend to use for your experiments. And math50k doesn't contain the samles of other math**k.