Hello, I encountered some file format issues while training the model.Now I have a batch of my own Clues and Answers data that I want to use for training, but I don't know how to use them in training.
What format is the dataset in the following code?
bash train_scripts/biencoder/tfidf.sh path/to/dataset
What are the specific formats of answers.jsonl and docs.jsonl?
What data was used by train.json and validation.json? Are they the ones posted on huggingface? However, there is a difference between the CSV on the huggingface and the JSON required here.
In summary, can you provide examples of training files required for each step of the training process so that we can rewrite our own training data format?
Hello, I encountered some file format issues while training the model.Now I have a batch of my own Clues and Answers data that I want to use for training, but I don't know how to use them in training.
bash train_scripts/biencoder/tfidf.sh path/to/dataset
In summary, can you provide examples of training files required for each step of the training process so that we can rewrite our own training data format?
Thank you very much indeed.