Fine-tune BB2: Strange predictions during training

I'm trying to fine-tune the BB2-400M model on MSC dataset using the following command:

parlai train_model -dp data \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task msc \
--num_epochs 10 \
--memory-decoder-model-file "" \
--init-model zoo:blenderbot2/blenderbot2_400M/model --dict-file zoo:blenderbot2/blenderbot2_400M/model.dict --fp16 True \
--datatype train:stream  \
--query-model bert_from_parlai_rag \
--rag-model-type token --rag-retriever-type search_engine --search_server 0.0.0.0:8080 \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--beam-min-length 20 --beam-context-block-ngram 3 --beam-block-ngram 3 --beam-block-full-context False \
--inference beam --fp16 True --fp16-impl mem_efficient --optimizer mem_eff_adam --learningrate 1e-05 \
--search-query-generator-model-file "" \
--search-query-generator-beam-min-length 2 \
--save-every-n-secs 600 \
--knowledge-access-method memory_only --memory-key personas --batchsize 1 \
--gold-document-titles-key select-docs-titles --insert-gold-docs True \
--label-truncate 128 --truncate 512 --max-doc-token-length 128 --min-doc-token-length 128

. Since I initialize the model with a pre-trained model ``zoo:blenderbot2/blenderbot2_400M/model```, I expected the model to produce good predictions for the first iteration during training. . However, here what i got for the first prediction during training: Input:

your persona: I like to remodel homes.
your persona: I like to go hunting.
your persona: I like to shoot a bow.
your persona: My favorite holiday is halloween.
Hi, how are you doing? I'm getting ready to do some cheetah chasing to stay in shape.

Prediction output

Raw score shape: torch.Size([1, 16, 50264])
Pred target shape : torch.Size([1, 16])
Pred target vector: tensor([[40078,    85, 47554, 37522, 10965, 46539, 49289, 14821, 40078,  7687,
           375, 46539, 37522, 46539, 14821, 14069]])
Pred target text (using self.dict.vec2text): rkinsintsarmaarma Threerishrishkinsrishundaotom Threearma Tightentials
.
Truth target vector: tensor([[ 1643,  1280,   311,   849,  3053,    17, 22063,   322,   534,   290,
           620,  4008, 45582,    17,     2]])
Truth target text: ['You must be very fast. Hunting is one of my favorite hobbies.']
.

I know that the dropout is enable during training but I don't think it will hurt the predictions significantly. Am I initialize the model correctly?

I follow the warning and changes the command as follow, and it works:

parlai train_model -dp data \
--model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--task msc --num_epochs 10 --include_last_session True \
--memory-decoder-model-file "" --memory-key personas \
--search-query-generator-model-file zoo:blenderbot2/query_generator/model --search-query-generator-beam-min-length 2 \
--save-every-n-secs 600 --validation_every_n_secs 600 --log_every_n_secs 60 \
--init-model zoo:blenderbot2/blenderbot2_400M/model --dict-file zoo:blenderbot2/blenderbot2_400M/model.dict \
--datatype train:stream  \
--embeddings-scale True --variant prelayernorm --split-lines True --learn-positional-embeddings True \
--n-layers 12 --embedding-size 1024 --ffn-size 4096 --n-heads 16 --n-decoder-layers 12 \
--dict-tokenizer gpt2 --generation-model bart \
--query-model bert_from_parlai_rag \
--rag-model-type token --rag-retriever-type search_engine --search_server None \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--gold-document-titles-key select-docs-titles --insert-gold-docs True \
--beam-min-length 20 --beam-context-block-ngram 3 --beam-block-ngram 3 --beam-block-full-context False --beam-size 10 \
--inference beam --optimizer mem_eff_adam --learningrate 1e-05 --lr-scheduler-patience 1 --model-parallel True \
--knowledge-access-method memory_only --batchsize 1 \
--truncate 512 --text-truncate 512 --label-truncate 128 \
--dropout 0.0 --attention-dropout 0.0 \
--min-doc-token-length 64 --max-doc-token-length 256 \
--fp16 True --fp16-impl mem_efficient --force-fp16-tokens True \

facebookresearch / ParlAI

Fine-tune BB2: Strange predictions during training #4347