cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
https://cdqa-suite.github.io/cdQA-website/
Apache License 2.0
614 stars 191 forks source link

Adding annotated training dataset #368

Open mathieudumayet opened 3 years ago

mathieudumayet commented 3 years ago

Hi,

in a blog you wrote:

You can also improve the performance of the pre-trained Reader, which was pre-trained on SQuAD 1.1 dataset. If you have an annotated dataset (that can be generated by the help of the cdQA-annotator) in the same format as SQuAD dataset you can fine-tune the reader on it:

# Put the path to your json file in SQuAD format here
path_to_data = './data/SQuAD_1.1/train-v1.1.json'
cdqa_pipeline.fit_reader(path_to_data)

Should we add our own training dataset that we constructed via cdQA-annotator into the "train-v1.1.json" file by editing it?

Thanks!