facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.09k forks source link

DrQA agent #437

Closed jenniferzhu closed 6 years ago

jenniferzhu commented 6 years ago

Hi, I was redirected to ParlAI from the DrQA GitHub repository and noticed that there is a drqa agent in ParlAI. However, I cannot find any documentation on that. What would be a starting point to use the DrQA system with the ParlAI framework?

jaseweston commented 6 years ago

Yes, there are some examples of usage here:

https://github.com/facebookresearch/ParlAI/blob/master/examples/README.md

(see bottom of README)

On Thu, Dec 7, 2017 at 1:13 PM, Jennifer Zhu notifications@github.com wrote:

Hi, I was redirected to ParlAI from the DrQA GitHub repository and noticed that there is a drqa agent in ParlAI. However, I cannot find any documentation on that. What would be a starting point to use the DrQA system with the ParlAI framework?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/437, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-ESMJBnm175Qn5HlWW2MpDuPwTkmks5s-CrYgaJpZM4Q5-LN .

vishnumenon commented 6 years ago

I tried using the line in the README to evaluate the pre-trained DrQA model, and I got terrible accuracy -- the final line of my output was {'total': 10570, 'accuracy': 0.01239, 'f1': 0.03498, 'hits@k': {1: 0.0124, 5: 0.0124, 10: 0.0124, 100: 0.0124}, 'train_loss': 0}. Any idea what might be causing this? I'm working on a google VM running ubuntu w python 3.6 (thru anaconda).

vishnumenon commented 6 years ago

Using the Interactive Mode gives me the answer "." for any questions/contexts I try

alexholdenmiller commented 6 years ago

Hi @vishnumenon, can you confirm which steps you ran?

vishnumenon commented 6 years ago

Hi! I just ran

git clone https://github.com/facebookresearch/ParlAI.git ~/ParlAI
cd ~/ParlAI; python setup.py develop

installed spacey because it showed up as an unsatisfied requirement (along with the 'en' language pack), and then ran

wget https://s3.amazonaws.com/fair-data/parlai/_models/drqa/squad.mdl
python eval_model.py -m drqa -t squad -mf squad.mdl -dt valid

Am i missing any steps?

alexholdenmiller commented 6 years ago

Ah, great--just wanted to make sure. I'll debug this--it should have gotten a good accuracy (I believe I ran this successfully just two weeks ago).

vishnumenon commented 6 years ago

Awesome, thanks!

vishnumenon commented 6 years ago

Oh! Also another change that I made -- initially, when I tried to run eval_model, I got this error:

Traceback (most recent call last):
  File "eval_model.py", line 46, in <module>
    main()
  File "eval_model.py", line 30, in main
    agent = create_agent(opt)
  File "/home/me/ParlAI/parlai/core/agents.py", line 319, in create_agent
    return model_class(opt)
  File "/home/me/ParlAI/parlai/agents/drqa/drqa.py", line 105, in __init__
    word_dict = DrqaAgent.dictionary_class()(opt)
  File "/home/me/ParlAI/parlai/agents/drqa/drqa.py", line 53, in __init__
    super().__init__(*args, **kwargs)
  File "/home/me/ParlAI/parlai/core/dict.py", line 187, in __init__
    import spacy
  File "/home/me/anaconda3/lib/python3.6/site-packages/spacy/__init__.py", line 4, in <module>
    from .cli.info import info as cli_info
  File "/home/me/anaconda3/lib/python3.6/site-packages/spacy/cli/__init__.py", line 5, in <module>
    from .profile import profile
  File "/home/me/anaconda3/lib/python3.6/site-packages/spacy/cli/profile.py", line 7, in <module>
    import cProfile
  File "/home/me/anaconda3/lib/python3.6/cProfile.py", line 22, in <module>
    run.__doc__ = _pyprofile.run.__doc__
AttributeError: module 'profile' has no attribute 'run'

Some googling indicated that it was because of the file named 'profile' in examples, so I changed the file's name (couldn't find any references to it that needed updating, not sure if that's what I missed) and then it seemed to run without issue.

vishnumenon commented 6 years ago

Update: I tried deleting everything and starting from scratch, and I now get this (seemingly unrelated) error:

[ Loading model squad.mdl ]
[ Using CUDA (GPU -1) ]
[creating task(s): squad]
loading: /home/me/ParlAI/data/SQuAD/dev-v1.1.json
Traceback (most recent call last):
  File "examples/eval_model.py", line 46, in <module>
    main()
  File "examples/eval_model.py", line 35, in main
    world.parley()
  File "/home/me/ParlAI/parlai/core/worlds.py", line 278, in parley
    acts[1] = agents[1].act()
  File "/home/me/ParlAI/parlai/agents/drqa/drqa.py", line 175, in act
    ex = self._build_ex(self.observation)
  File "/home/me/ParlAI/parlai/agents/drqa/drqa.py", line 257, in _build_ex
    inputs['document'], doc_spans = self.word_dict.span_tokenize(document)
  File "/home/me/ParlAI/parlai/core/dict.py", line 288, in span_tokenize
    if self.tokenizer == 'spacy':
AttributeError: 'SimpleDictionaryAgent' object has no attribute 'tokenizer'
alexholdenmiller commented 6 years ago

@jaseweston I'm renaming profile.py => profile_train.py

@vishnumenon I found that bug myself, fixing it now and then you should get full training accuracy again.

vishnumenon commented 6 years ago

Sounds good, thanks!

alexholdenmiller commented 6 years ago

pushed fix for profile, PR out for dictionary loading fix: #443

alexholdenmiller commented 6 years ago

fix in! running the eval script after pulling should give you:

{'total': 10570, 'accuracy': 0.6634, 'f1': 0.7685, 'hits@k': {1: 0.663, 5: 0.663, 10: 0.663, 100: 0.663}, 'train_loss': 0}
vishnumenon commented 6 years ago

Yup, it's working great now, thanks!

alexholdenmiller commented 6 years ago

@jenniferzhu I'm going to close this for now but feel free to reopen if you have any more questions!

jenniferzhu commented 6 years ago

@alexholdenmiller Hi, I ran most of those examples. They worked well, but my question is that how can I add my extra text data to answer questions, using a pre-trained model? i.e. if we don't have a good training QA dataset, how can we use the model for our content?

jaseweston commented 6 years ago

@alexholdenmiller or @klshuster ?

jenniferzhu commented 6 years ago

@alexholdenmiller I just re-pulled and then ran the same commands @vishnumenon, but still have the low accuracy issue.

wget https://s3.amazonaws.com/fair-data/parlai/_models/drqa/squad.mdl python eval_model.py -m drqa -t squad -mf squad.mdl -dt valid

Here is the error after interactive.py:

Bob is Blue.\nWhat is Bob? /Users/xuanzhu/ParlAI/parlai/agents/drqa/layers.py:177: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. alpha_flat = F.softmax(scores.view(-1, y.size(1))) /Users/xuanzhu/ParlAI/parlai/agents/drqa/layers.py:232: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. alpha = F.softmax(scores) /Users/xuanzhu/ParlAI/parlai/agents/drqa/layers.py:212: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. alpha = F.softmax(xWy)

jenniferzhu commented 6 years ago

@alexholdenmiller Updates: I re-installed ParlAI on a new laptop, and still get the same problem...

alexholdenmiller commented 6 years ago

I'll check it out

alexholdenmiller commented 6 years ago

Sorry I missed the previous question: you can train the model with a different task (e.g. a custom one for your other data) and train the model from there.

For example...

python examples/train_model.py -t babi:task10k:3 -m drqa -mf squad.mdl -bs 32 -vtim 5

...the pretrained model gets ~46% accuracy on the first validation, but it steadily goes up from there (I got up to ~68% accuracy in only two minutes minutes before I stopped it).

alexholdenmiller commented 6 years ago

feel free to reopen if you have any questions! #591 fixes the low accuracy, it was an issue loading the dictionary