Open cp-jose opened 10 months ago
For this task (2.3.1 Awareness of tool usage), whose results are shown on table 3. (section 3.2), the positive samples - where the LLM needs to use a tool to address the user query - are drawn from the the generated single tool user queries (file
dataset/data/all_clean_data.csv
, with 20,615 queries), and the negative samples - where the LLM does not need a tool - are said to be drawn from three recent instruction datasets.From the writing of the experimental section it seems the test set (used to produce table 3) has a 50%/50% split of positive/negative samples. I have a few questions about this:
- Is this proportion correct?
- Exactly how many queries are in used in this experiment?
- Are the negative samples made available on the repository? If so where? (couldn't find them)
Sorry for the confusion. The proportion is correct. We totally use 515 x 2 samples.
"Are the negative samples made available on the repository?": I'm sorry for that. I just found myself forgetting to upload it. I will upload it soon.
OK, thank you. How soon do you expect to have this fixed? I've pointed out a number of mistakes already, but nothing changed on the repository. It's still 3 months old.
OK, thank you. How soon do you expect to have this fixed? I've pointed out a number of mistakes already, but nothing changed on the repository. It's still 3 months old.
Hi,
I have uploaded the datasets in dataset/tmp_dataset. You can check about it.
For this task (2.3.1 Awareness of tool usage), whose results are shown on table 3. (section 3.2), the positive samples - where the LLM needs to use a tool to address the user query - are drawn from the the generated single tool user queries (file
dataset/data/all_clean_data.csv
, with 20,615 queries), and the negative samples - where the LLM does not need a tool - are said to be drawn from three recent instruction datasets.From the writing of the experimental section it seems the test set (used to produce table 3) has a 50%/50% split of positive/negative samples. I have a few questions about this: