Closed hsgodhia closed 6 years ago
One of possible explanations might be that Neural Conversational Model paper is using another version of Open Subtitles corpora (2013 in particular), which is 60-70 times larger, than the one used in ParlAI right now. This PR is adding newer version - https://github.com/facebookresearch/ParlAI/pull/562
Also, the learning rate looks pretty small. As far as I remember, they used 1.0.
Actually I downloaded the 2018 data which had a vocabulary about 100k and about 100M dialogs and trained on that, about the learning rate I'm not sure if with an adam optimizer lr = 1?
We are getting decent results on the Twitter dataset with https://github.com/facebookresearch/ParlAI/tree/master/parlai/agents/language_model you could also try that? although @emilydinan is about to push a small change to it that seems to help (adding PERSON1, PERSON2 tags to indicate change of speaker)
@hsgodhia @urikz the small change that @jaseweston was referring to can be found in the PR here: https://github.com/facebookresearch/ParlAI/pull/573/files
@hsgodhia @urikz an additional change that may help is limiting the number of tokens used in your dictionary. A new flag '--dict-maxtokens' allows you to take the top N tokens from the dictionary after sorting by frequency (see here https://github.com/facebookresearch/ParlAI/pull/565 for the PR that added this)
@jaseweston You mean decent results with default parameter settings? Or can you please share the hyper parameters?
@emilydinan i think default?
@Jackberg I got somewhat decent results using the language_model in ParlAI, not seq2seq. The hyperparameters I used for that are -vtim 360 -esz 200 -hs 500 -nl 2 -lr 10 -bs 20
(and this is on the Twitter datset, currently training some on the new opensubtitles)
@emilydinan Thanks so much!
Would I be doing something wrong as I run this to do evaluation after training
python3.6 examples/interactive.py -dt test:stream -t opensubtitles -m language_model -bs 1 -mf godz2
and I basically get
`
[creating task(s): parlai.agents.local_human.local_human:LocalHumanAgent]
Enter Your Message: hih
Enter Your Message: hi.
Enter Your Message: hi ` usually I would expected the predictions to come next.
@hsgodhia This PR should fix that: https://github.com/facebookresearch/ParlAI/pull/579 . Sorry about that-- what was happening is that in interactive mode, the Language Model defaults to training mode since "eval_labels" are not present, and so predictions are not produced.
Hi
I'm not sure if it was fixed. I did a git pull a second back and running the command give
python3.6 examples/interactive.py -dt test:stream -t opensubtitles -m language_model -bs 1 -mf godz2
hals/ParlAI'}
[ no model yet at: godz2 ]
[ Using CUDA ]
Loading existing model params from godz2
Overriding option [ hiddensize: 200 => 500]
Dictionary: loading dictionary from godz2.dict
[ num words = 100000 ]
[creating task(s): parlai.agents.local_human.local_human:LocalHumanAgent]
Enter Your Message: hi there harshal
Traceback (most recent call last):
File "examples/interactive.py", line 47, in <module>
main()
File "examples/interactive.py", line 38, in main
world.parley()
File "/home/harshals/ParlAI/parlai/core/worlds.py", line 291, in parley
acts[1] = agents[1].act()
File "/home/harshals/ParlAI/parlai/agents/language_model/language_model.py", line 463, in act
return self.batch_act([self.observation])[0]
File "/home/harshals/ParlAI/parlai/agents/language_model/language_model.py", line 441, in batch_act
output, hidden, loss_dict, predictions = self.predict(data_list[i], self.hidden, targets_list[i], self.is_training, y_lens)
File "/home/harshals/ParlAI/parlai/agents/language_model/language_model.py", line 365, in predict
loss = self.get_target_loss(data, self.hidden, targets, y_lens)
File "/home/harshals/ParlAI/parlai/agents/language_model/language_model.py", line 296, in get_target_loss
loss += self.eval_criterion(output_flat, targets.select(1,0).view(-1)).data
RuntimeError: dimension specified as 1 but tensor has no dimensions
@hsgodhia The problem is with this line: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/language_model/language_model.py#L365 since this targets are None in interactive mode. Currently working on a fix-- I'll keep you updated-- but in the meantime for testing purposes you can comment out that line and replace with "loss = 0".
@hsgodhia the fix is merged now: https://github.com/facebookresearch/ParlAI/pull/580
Got it. Yes, working on my side now. Here are some results after training for 15hrs with above parameters on latest version of opensubtitles
[ num words = 100000 ]
[creating task(s): parlai.agents.local_human.local_human:LocalHumanAgent]
Enter Your Message: hi there how are you doing?
[LanguageModel]: PERSON2 I 'm fine .
Enter Your Message: Is it raining today?
[LanguageModel]: PERSON2 I 'm going to count to three .
Enter Your Message: where are you going for dinner?
[LanguageModel]: PERSON2 We 're in the middle of the night .
Enter Your Message: which city are you from?
[LanguageModel]: PERSON2 I 'm in charge of this situation .
Enter Your Message: is the movie black panther good?
[LanguageModel]: PERSON2 I don 't know .
Enter Your Message: is east coast better or west coast?
[LanguageModel]: PERSON2 I 'm sure I 'll see it later .
Enter Your Message: training you was not fun
[LanguageModel]: PERSON2 I was just trying to get up on the roof and see if I could signal for help .
Enter Your Message: fine
[LanguageModel]: PERSON2 I 'm not gonna hurt you .
Enter Your Message: bye
[LanguageModel]: PERSON2 I 'm not gonna hurt you !
Enter Your Message: hi
[LanguageModel]: PERSON2 I 'm here , John .
Enter Your Message: see you
[LanguageModel]: PERSON2 I 'm going to count to three .
Enter Your Message: whatever
[LanguageModel]: PERSON2 I 'm not gonna hurt you .
Would it be possible to get a set of parameters to replicate results of the seq2seq agent, I believe bidirection and attention would give better results but the seq2seq agent currently collapses to uttering one response or not great results
Hi all, here's what I found and some guesses:
Hope you guys can locate the problem! Cheers!
On Fri, Feb 16, 2018 at 11:48 AM, Jackberg notifications@github.com wrote:
Hi all, here's what I found and some guesses:
- Language model does perform well on Twitter task!
- Yet it performs bad on OpenSubtitles dataset by constantly responding "I don't know". The reason I guess is two-fold: 1) the 2009 dataset is not big enough; 2) the training is case-sensitive. And through experiments, I found using lowercase dictionary did make the performance way better.
This is a classic known problem in these kind of models, mentioned in several papers, see e.g.: https://arxiv.org/pdf/1510.03055v2.pdf
- However, Seq2Seq model converges very bad on Twitter dataset. I think it uses the same loss-function and perplexity metric as language model (didn't have time to look deep into this part), right? But the loss of language model is always below 10 while Seq2Seq loss always being ~200.
- From the experiments I did on OpenSubtitles, I guess the problem of Seq2Seq is related to learning rate implementation? Because LM by default uses LR=20, while Seq2Seq defaults to LR=0.005, and bigger values (roughly higher than 0.5) will make the training diverge.
Hope you guys can locate the problem! Cheers!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/550#issuecomment-366291192, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-NS2QRn0MBLYMYjMquV31caL_t0Kks5tVbFmgaJpZM4R9a2y .
@jaseweston Sorry, which problem do you mean, the general response "I don't know"? Actually by saying "constantly", I mean it will response this to almost every user utterance. And this problem can be solved by using lowercase dictionary.
So here I'm suggesting to use lowercase words by default.
Yes, the "I don't know problem", check that paper link. It cannot be completely solved easily, it seems, although can be fixed to some degree with tricks indeed.
On Fri, Feb 16, 2018 at 11:59 AM, Jackberg notifications@github.com wrote:
@jaseweston https://github.com/jaseweston Sorry, which problem do you mean, the general response "I don't know"? Actually by saying "constantly", I mean it will response this to almost every user utterance. And this problem can be solved by using lowercase dictionary.
So here I'm suggesting to use lowercase words by default.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/550#issuecomment-366294392, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-NCzcEV-5_n1qqsSz3Flece5IUjuks5tVbPegaJpZM4R9a2y .
@jaseweston Sorry for the confusion. I didn't mean to solve this problem. I just meant that the case-sensitive dataset will make the model perform really bad by always giving the same response.
Hi @Jackberg -- thanks for your notes! That's great that you could also see that language model performing well on the Twitter task. Some comments about the opensubtitles training...
Hope this helps. @alexholdenmiller might have some thoughts on which hyperparameters to use for seq2seq training...
@emilydinan @alexholdenmiller Great! Looking forward to your recipe for seq2seq model!
seq2seq is currently doing clip https://github.com/facebookresearch/ParlAI/blob/c3827177770883f12b68c4cb38b3a9611a84323a/parlai/agents/seq2seq/seq2seq.py#L245 and provides torch.optim optimizers (default adam) I also think that the language model may benefit from having an adaptive learning rate
@hsgodhia -- you're right, that was added recently. I missed that. I edited my comment.
Hi @hsgodhia , I've figured out the causes for the bad performance of your testing of seq2seq model–the attention model is making a damage here. Here's what I got during 1 epoch using -att none
:
TEXT: yeah, to get there, we need to make one really good syntax/formatter. opens up arbitrary.
PREDICTION: i love you so much for your own .
TEXT: nu-skin? 😂 gop congressman jason chaffetz is being financed by an illegal chinese pyramid scheme via
PREDICTION: i love you so much !
Looks much better huh? So I believe there's something wrong in the attention model @emilydinan . Currently it can only give 1 or 2 responses, I'll train it for more epochs and see what it'll learn.
@Jackberg I'm having trouble getting it to produce high quality text on standard opensubtitles, we're still working on a few other changes to try to improve it
I'm seeing the same thing with really low quality attention generation
Bug in attention fixed thanks to @Jackberg, attention is looking much better. Also added post-attention (attention is calculated using output representation of RNN instead of input token representation) and that's looking better as well.
Adding IBM's seq2seq model if that's helpful for comparison: #601
Hi all, I'm going to close this with an summary of running a single new train job on opensubtitles. I think the biggest change compared to the past is a better tokenizer.
I trained the seq2seq model again on opensubtitles 2009 with the following command:
python examples/train_model.py -gpu 1 -m seq2seq -t opensubtitles:v2009 --dict-lower true -tok re --dict-maxtokens 100000 -hs 2048 -esz 300 -emb glove -opt sgd -lr 3 -dr 0.1 -att none -bs 32 -tr 120 --dict-include-valid false -nl 2 -clip 0.1 -lt enc_dec -histsz 7 -pt true -mom 0.9 -vp 16 -veps 0.25 -mf /tmp/os_s2s_2 -ltim 10 -vmt ppl -vmm min
So, this model used a hidden size of 2048 (half that of "A Neural Conversational Model", Vinyals 2015), embedding size of 300 (I couldn't tell--maybe they used 2048?), 2 layer LSTM (same as vinyals), vocab size of 100k (same as vinyals), no attention (same as vinyals).
They said they converged to 17 ppl on the validation set. We calculated a perplexity of 24.95 on the validation set, 21.36 on the test set, including end tokens but only calculated on the "right hand side"--that is, only the outputs of the model given an input (producing "my name is bill" in response to the input "human what is your name ?"), not over the entire sequence concatenated together ("human what is your name ? machine my name is bill" or something similar).
This model was running validation every 0.25 epochs, and reached it's best valid performance after 8.25 epochs. I did not sweep over parameters but just ran the above training, I expect this could be improved further.
Hi
I am trying to get running a simple seq2seq model with decent results on opensubtitles. I ran the below command on a 12GB GPU Ram Nvidia for 15 hours but the results are not as I am expecting, I was expecting results like Neural Conversational Model paper (1506.05869)
I have tried different variants of the hidden size from [2048, 1024, 512] and similarly embedding size with tradeoff in batch size so that RAM capacity is not crossed. Also tried the default options which come with default seq2seq but results are not good. Any tips on where I may be going wrong?
Sample results like-