Open kurtespinosa opened 7 years ago
Hi,
The indexing for some variables like question and queston_id seem interchanged, because jsonify.py
requires some extra preprocessing beforehand (and I am sorry that it is not provided on this repo).
It is basically for removing the questions that do not contain correct answer in it, as described on the original paper.
So please fix the code if you think it is necessary.
After preprocessing, the file should look like:
{"label": "0", "sentence_id": "D11-0", "question": ["how", "big", "is", "bmc", "software", "in", "houston", ",", "tx"], "title": "BMC Software", "answer": ["bmc", "software", ",", "inc.", "is", "an", "american", "company", "specializing", "in", "business", "service", "management", "(", "bsm", ")", "software", "."], "document_id": "D11", "question_id": "Q11"}
{"label": "0", "sentence_id": "D11-1", "question": ["how", "big", "is", "bmc", "software", "in", "houston", ",", "tx"], "title": "BMC Software", "answer": ["headquartered", "in", "houston", ",", "texas", ",", "bmc", "develops", ",", "markets", "and", "sells", "software", "used", "for", "multiple", "functions", ",", "including", "it", "service", "management", ",", "data", "center", "automation", ",", "performance", "management", ",", "virtualization", "lifecycle", "management", "and", "cloud", "computing", "management", "."], "document_id": "D11", "question_id": "Q11"}
Each line contains one QA pair in json format.
Thank you for taking time to answer my question. This clarifies it.
Dear Butsugiri,
Thank you for sharing your code. I have a question about the input dataset which I would need to jsonify. I download the dataset and used the respective data partitions, for example, WikiQA-test.tsv for test set which has a sample file entry below.
Now, I'm confused because in the jsonify code, the
question
would point toD0-0
which is the sentenceID. It seems that thequestion_id
and thequestion
were interchanged, am I right or did I miss out anything?should have been the following?
Cheers, Kurt