facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.47k stars 2.09k forks source link

Does BlenderBot2 support training with --inference nucleus? #4479

Closed rguan1 closed 2 years ago

rguan1 commented 2 years ago

Hi,

I trained BlenderBot2 3B with the option of inference set to nucleus. In all the example training scripts I've found in ParlAI's github issues, the inference is set to beam. I initially thought that this may not be a problem, since training with BlenderBot1 seems to support this option. However, the results that I get back when training BlenderBot2 with nucleus have been mildly incoherent. So, could switching the type of inference be the cause of the issue?

Examples: "I'd never think them if things that he was such the US who will be from people." "I can be rather being our people to like others from their situations when others." "I guess about others it does for the result. Are the things these one prospectses is it's taking to take it's hard."

stephenroller commented 2 years ago

Hm you might want to tweak the --topp argument. Higher is like less coherent.

rguan1 commented 2 years ago

I see. My topp is indeed quite high; however seems to work well for blenderbot1. Here is the full command for reference.

Thank you. I will tweak the topp parameter.

python3 $HOME/ParlAI/parlai/scripts/train_model.py --task ${tasks} --num_epochs ${num_epochs} \
--model-file $HOME/thesis/chatbot_models/BB2/${mf_name} --batchsize 1 --checkpoint-activations true --dynamic-batching full \
-veps 0.25 --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent \
--dict-file zoo:blenderbot2/blenderbot2_3B/model.dict --init-model zoo:blenderbot2/blenderbot2_3B/model \
--skip-generation True --knowledge-access-method all --memory-key full_text \
--search-server 0.0.0.0:8080 --metrics ppl --validation-metric ppl -vmm min --model-parallel True --memory-decoder-model-file '' \
--datatype train:stream --generation-model transformer/generator --query-model bert_from_parlai_rag \
--rag-model-type token --rag-retriever-type dpr --max-doc-token-length 64 \
--dpr-model-file zoo:hallucination/bart_rag_token/model \
--inference nucleus --topp 0.93 \
--fp16-impl mem_efficient --optimizer adam --truncate 128 --text-truncate 128 --label-truncate 128 \
--history-add-global-end-token end --warmup-updates 100 \
--insert-gold-docs True --activation gelu --attention-dropout 0.0 --dict-tokenizer bytelevelbpe --dropout 0.1 \
--embedding-size 2560 --ffn-size 10240 --force-fp16-tokens true --fp16 true --n-decoder-layers 24 \
--n-encoder-layers 2 --n-heads 32 --n-positions 128 --variant prelayernorm --delimiter ' '
klshuster commented 2 years ago

the generation method specified during training does not influence training, it is only used during the validation steps in which one might care about a generation metric for validation; in fact, with --skip-generation True, your model is not even generating anything during validation

rguan1 commented 2 years ago

I see. So, I guess the other factor that may cause the incoherent outputs is the search engine implementation, since I implement a custom document search using the Python Whoosh package. I followed the format here for the response json: "Each document is a mapping (dictionary) of string->string with at least 3 fields: url, title, and content". However, I am unsure what should be outputted when no search result is returned. Currently, it returns: {"response": []}. Is that correct?

For reference, here is the json for a non-empty response from the search engine.

{
    "response": [
        {
            "title": "Miami Marlins Pitcher",
            "content": "Miami Marlins . . . (truncated)",
            "url": ""
        }
    ]
}

Thank you for help!

klshuster commented 2 years ago

I believe that should be fine, maybe @mojtaba-komeili has a better idea

rguan1 commented 2 years ago

That is good hear then.

I am running a self_chat under identical conditions that resulted in the incoherent example utterances above on a finetuned blenderbot2 model except that I specified beam search in the training script rather than nucleus (and set beam search related parameters). The utterances this time are fluent. I'm not exactly sure why this is the case given what you've said above.

Beam search utterances examples: "What do you think about the arrest of the 27 year old man? Do you think it was justified?" "Terrorism is such a scary thing. I can't imagine going through something like that."

Nuclear search utterances (duplicates): "I'd never think them if things that he was such the US who will be from people." "I can be rather being our people to like others from their situations when others."

Thanks again for the help debugging this problem. I appreciate that you all take time out of your day for this!

klshuster commented 2 years ago

can you share your exact generation parameters?

rguan1 commented 2 years ago

I'm not exactly sure what generation parameter means, but hopefully this is what it means.

Here's the script that I use for self_chat. It's slightly modified on the default self_chat script such that it can add a history of the conversation to both bot's history.

conda activate blenderbot2server

cd $HOME/ParlAI_SearchEngine

python search_server.py serve --host 0.0.0.0:8080 --search_engine='CustomDoc' &

conda activate chatbot 

cd $HOME

python $HOME/ParlAI/parlai/scripts/ryan_custom_scripts/ryan_self_chat_with_history.py --selfchat-max-turns 10 \
--num-self-chats 100 -mf $HOME/thesis/chatbot_models/BB2/eche_bb2_4epo_custdoc_apr4_beam/model \
--add-history-from-file $HOME/thesis/self_chat_and_history/chat_data/history_json/full_passage.json \
--outfile $HOME/thesis/self_chat_and_history/chat_data/outfiles/eche_bb2_4epo_custdoc_apr4_beam_full_passage.jsonl \
-pmf zoo:blender/blender_3B/model

Here are the opts of the beam search bb2 model as well as the incoherent nucleus search bb2 model attached.

bb2_beam_model_opt.txt bb2_nucleus_model_opt.txt

klshuster commented 2 years ago

yeah, topp: 0.93 is quite high, maybe try something like 0.3?

rguan1 commented 2 years ago

I have ran a self chat with blenderbot2 model with a low topp score of 0.8 that I trained last night when Stephen first suggested tweaking that parameter. The results are still incoherent. Should I still proceed with a lower topp than that?

Some examples of the low topp model utterances: "just the" "No easy feel. and what me no health to may it. " "the of seems"

Here's the full conversation with an un-tuned blenderbot1 model as the partner in the conversation:

image

Here is the opts of the low topp model bb2_nucleus_low_topp_model_opt.txt

klshuster commented 2 years ago

I tried nucleus with the released checkpoint for BB2 with topp 0.9 and got similarly poor generations. With 0.3, the generations were much more sensible, so I'd explore some lower numbers for your use case as well.

rguan1 commented 2 years ago

I set topp 0.3 rather than 0.8 for the model that was trained with low topp since generation method does not influence training. The results for self chat are still not sensible. Perhaps I am overriding the opt wrong? I just went to the model's opt file and changed the topp value from 0.8 to 0.3.

[BlenderBot2Fid_1]: yeah the. to,. of it, but it's to be more known to be a major problem.
   [TransformerGenerator_2]: Yes, it is. The game was played in Scotland, England, Wales and Northern Ireland.
[BlenderBot2Fid_1]: , it they the is I don't a way think like like the for a. , but I don't. 
   [TransformerGenerator_2]: Yeah, I know what you mean. It's hard to believe that it's been going on for so long.
[BlenderBot2Fid_1]: it Yeah. yeah that yeah, it is a really is a problem imagine a good just.   
   [TransformerGenerator_2]: I can't even imagine what it would be like to have to deal with that kind of thing.
[BlenderBot2Fid_1]: yeah it is is I. so to get, it's it is the it is really. 
   [TransformerGenerator_2]: Yeah, that's a good way to look at it. I hope things get better for you.
[BlenderBot2Fid_1]: yeah yeah, of yeah that Yeah yeah, it is a good being this. 
   [TransformerGenerator_2]: I'm glad to hear that. I wish you the best of luck in your future endeavors.
[BlenderBot2Fid_1]: , that of imagine, for it the to think. 
   [TransformerGenerator_2]: Thank you very much, I appreciate that. It's always good to have a positive outlook on life.
[BlenderBot2Fid_1]: it it I of so to be sla the. 
   [TransformerGenerator_2]: That's a very interesting way of looking at things. I've never thought of it that way before.
[BlenderBot2Fid_1]: yeah yeah.... of it's, it is really like, but I don't to think, but it is so so it. 
   [TransformerGenerator_2]: I think that's a great way to look at it. I'll have to keep that in mind.
[BlenderBot2Fid_1]: yeah so I,. so Yeah yeah, it is a good is a big problem a problem toll on the
   [TransformerGenerator_2]: Do you have any hobbies that you like to do in your spare time to help with that?
klshuster commented 2 years ago

if you're going into the model.opt, you'll also want to change the topp parameter in the override key. you can also specify via command line, --topp 0.3

rguan1 commented 2 years ago

I see, thanks!