However, I found that some instructions are not in the file and it is difficult to determine which dataset the instructions belong to.
For example, "Retrieve passages from Wikipedia to answer the following question" and "Retrieve passages from Wikipedia to answer".
Thanks for sharing the code of TART publicly! I would like to split the data in the tart_full training set (st_train_ranker_input.json) into each source dataset by matching the instruction to the dataset name, based on this file (https://github.com/facebookresearch/tart/blob/main/BERRI/berri_instructions.tsv).
However, I found that some instructions are not in the file and it is difficult to determine which dataset the instructions belong to. For example, "Retrieve passages from Wikipedia to answer the following question" and "Retrieve passages from Wikipedia to answer".
Besides, I have discovered a similar issue https://github.com/facebookresearch/tart/issues/8#issuecomment-1591062115.
Please help me solve this issue, I would be very grateful!