Open rsanjaykamath opened 5 years ago
Hey,
This problem was only for TriviaQA-Unfiltered. I'm working now to fix it.
Also i have uploaded most SQuAD2.0 converted datasets, please see the main readme.
multi75k can be create by adding multiple datasets to the list of datasets to convert. But i will also upload it eventually.
Thanks Alon
Hi,
Thanks for your reply. Perfect, this makes it all easy for me. :)
However, it seems like some links are broken in SQUAD 2.0 format for SQUAD datasets.
Thanks in advance.
File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in combine_context for ac in qa['answers']['open-ended']["answer_candidates"]: KeyError: 'answer_candidates'
Same error when I run this:
python multiqa.py evaluate --model HotpotQA --datasets NewsQA,SearchQA --cuda_device 0
or this,
python multiqa.py evaluate --model SQuAD1-1 --datasets SQuAD1-1,NewsQA,SearchQA --cuda_device 0
Thanks for noticing! These are still a format change minor bugs, I will take care of it shortly and provide tests.
On Fri, 20 Sep 2019 at 12:34 Sanjay Kamath notifications@github.com wrote:
File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in combine_context for ac in qa['answers']['open-ended']["answer_candidates"]: KeyError: 'answer_candidates'
Same error when I run this:
python multiqa.py evaluate --model HotpotQA --datasets NewsQA,SearchQA --cuda_device 0 or this,
python multiqa.py evaluate --model SQuAD1-1 --datasets SQuAD1-1,NewsQA,SearchQA --cuda_device 0
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/alontalmor/MultiQA/issues/12?email_source=notifications&email_token=ACUIPDO56RNADWQJS3DMJADQKSRLTA5CNFSM4IYKEYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GJMBY#issuecomment-533501447, or mute the thread https://github.com/notifications/unsubscribe-auth/ACUIPDJVVXGU4EIQONXWYRTQKSRLTANCNFSM4IYKEYSQ .
Hey, I've maid the fixes and uploaded the datasets. In addition, I will upload PytorchTransformers results and models on several datasets soon...
Perfect! The problem is solved hence closing the thread. Thanks.
Hi, sorry to reopen this again.
I was wondering how do you combine datasets when we pass a list of datasets while training. Does it concat all questions from the datasets in the list ? or take only 15k to create multi75k as you mentioned above in this thread?
Thanks in advance.
Hey,
It concatenates per example so that each batch has a mix of datasets. You can control how much you take from each dataset using "sample_size"
Hope that helped...
Hi,
OK so each batch has a mixture of data, but when I use the default setting do we use all samples?
For instance, if 5 datasets are 100k each, concatenating them as you do in default training setting as shown in readme results in 500k samples right? not 15k*5 = 75k?
Just wanted to know how much samples do I use when I train 5 sets.
Thanks
To use all the data just make sure sample_size=-1 It will use consume all the datasets until they are done. (even if they are not equally sized.. )
Ok alright. Got it now. Thanks
Hi,
I'm trying to run pytorch-transformers code as you highlight in your repository using the same code. It works well for SQUAD 1 & 2, NewsQA. But when I try the exact same code for DROP and TriviaQA_Wiki it says "ValueError: num_samples should be a positive integer value, but got num_samples=0"
I see that in DROP_dev.json there is no answers at all, the "answers" field contain an empty list, and "is_impossible" is false.
@rsanjaykamath This is a few years on, but I'm actually trying to do the same thing. Did you ever get a good answer for evaluating DROP using the SQuAD setup?
Hi,
This error seems to occur when I'm converting any datasets from jsonl.gz to Squad type json.
File "convert_multiqa_to_squad_format.py", line 43, in multi_example_to_squad for answer_cand in qa['answers']["open-ended"]['answer_candidates']: KeyError: 'answer_candidates'
Also how can I get the multi75k dataset? Are you planning on releasing it ?
Thanks for your time.