facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.09k forks source link

About the code of Wizard of Wikipedia: Knowledge-Powered Conversational agents #2292

Closed ChuanMeng closed 4 years ago

ChuanMeng commented 4 years ago

We find a problem about the code of Wizard of Wikipedia: Knowledge-Powered Conversational agents.

For data preprocessing, In the script ParlAI\parlai\tasks\wizard_of_wikipedia\agents.py, "def len_episode" (line 247), we found that "(len(d['dialog']) - 1) // 2" will lose the last example in a dialogue if wizard first, and if getting rid of "- 1" to "(len(d['dialog']) ) // 2" will include all examples in a dialogue under the condition of wizard first.

I would like to confirm with you whether there is this problem. If not, what's the meaning of "- 1" in "(len(d['dialog']) - 1) // 2" under the condition of wizard first.

stephenroller commented 4 years ago

Thanks for asking on GitHub. Here's the block OP is referencing:

https://github.com/facebookresearch/ParlAI/blob/078d09d8d3da0b1df969d73256d32c35b9132ece/parlai/tasks/wizard_of_wikipedia/agents.py#L246-L251

I put a breakpoint and checked what was happening. If the wizard goes first, then the dialogue always ends on the apprentice. This makes that very last turn NOT an example, because there is no response from the wizard to use as gold data.

ipdb> pprint.pprint([(t['text'], t['speaker']) for t in d['dialog']])
[('I think science fiction is an amazing genre for anything. Future science, '
  "technology, time travel, FTL travel, they're all such interesting concepts.",
  '0_Wizard'),
 ("I'm a huge fan of science fiction myself! ", '1_Apprentice'),
 ('Awesome! I really love how sci-fi storytellers focus on '
  'political/social/philosophical issues that would still be around even in '
  'the future. Makes them relatable.',
  '0_Wizard'),
 ('I agree. One of my favorite forms of science fiction is anything related to '
  'time travel! I find it fascinating.',
  '1_Apprentice'),
 ("It's not quite sci-fi, but my favorite version of time travel is in Harry "
  'Potter and the Prisoner of Azkaban. Breaks zero logical rules.',
  '0_Wizard'),
 ("And that's difficult to do when dealing with time travel. I actually "
  "haven't seen the latest Harry Potter movies. Guess it's time to check them "
  'out!',
  '1_Apprentice'),
 ('If you really want a look at the potential negative consequences of '
  'scientific innovation, what you should check out is the TV show Fringe. '
  'Incredibly well written.',
  '0_Wizard'),
 ('Thank you for the suggestion, I will definitely check it out!',
  '1_Apprentice'),
 ('It blends science fiction and paranormal/psychological/MK Ultra type stuff '
  "together, but it's science fiction at its core.",
  '0_Wizard'),
 ('Always looking for more science fiction to digest!', '1_Apprentice')]

And another

ipdb> pprint.pprint([(t['text'], t['speaker']) for t in d['dialog']])
[("I don't know how to be romantic. I have trouble expressing emotional "
  'attraction.',
  '0_Wizard'),
 ('I feel the same. I find it hard to make many friends so finding a romantic '
  'partner seems impossible.',
  '1_Apprentice'),
 ('The term "romance" originated in the medieval ideal of chivalry.',
  '0_Wizard'),
 ('Interesting. I feel that today the meaning has been changed to fit with '
  "Hollywood's idea of romance.",
  '1_Apprentice'),
 ('For sure. Romantic love is relative but usually accepted as moments of '
  'intimacy.',
  '0_Wizard'),
 ('Romance can be small acts, like making breakfast for your significant '
  "other. But it's portrayed as grand gestures which are unattainable. ",
  '1_Apprentice'),
 ('I agree it has been portrayed as impossible actions. Love consists of a '
  'variety of emotion and mental states.',
  '0_Wizard'),
 ('It seems that in todays world it can be even harder to find a partner too.',
  '1_Apprentice'),
 ('Good point. Romance is associated with perfect partners, which is often '
  'unattainable. Sexual attraction often is stronger.',
  '0_Wizard'),
 ('I suppose. But I think that sexual attraction is just the initial feeling '
  'which then turns into romance.',
  '1_Apprentice')]

This is also definitely the code for the teacher we ran in the paper, so it's the source-of-truth.

ChuanMeng commented 4 years ago

Doing so will lose of an example under the condition of wizard first. For example, as the first example you presented, there are ten utterances and five wizard utterances. So there should be five examples (the first utterance produced by wizard also regard as a example in this script, and the chosen topic will be regard as a message and placed in the front of that; please see line 285~287 in ParlAI\parlai\tasks\wizard_of_wikipedia\agents.py ). However, "(len(d['dialog']) - 1) // 2" only return 4 examples. Please double check : ) Thanks!

stephenroller commented 4 years ago

Okay I see, yeah one seems cut off. Regardless, this is the code we ran so it's the definitive truth of the code.

stephenroller commented 4 years ago

Hmm I think the way to go forward is to make another teacher that has the full data. We need to distinguish between what was used in the paper and what is available.

github-actions[bot] commented 4 years ago

This issue has not had activity in 30 days. Marking as stale.