facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.46k stars 2.09k forks source link

How to train my own blenderbot2_400M/model or blender2/query_generator model #3876

Closed RyanYip-Kat closed 2 years ago

RyanYip-Kat commented 3 years ago

How can I train zoo:blenderbot2/blenderbot2_400M/model or query_generator model with my own dataset?

klshuster commented 3 years ago

hi there - the query generator model for BB2 is just a regular BART model trained on the query generation task for Wizard of Internet (and multitasked with the MSC tasks to predict when to access memory); here's an example invocation for training a BART model with your own dataset:

python -m parlai.scripts.train_model -t custom_task --fp16 True --fp16-impl mem_efficient --gradient-clip 0.1 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 --model-parallel True --model bart -o arch/bart_large --save-after-valid True --skip-generation True --optimizer adam -lr 7e-06 --warmup-updates 100 -vmm min -vmt ppl -vp 5 -veps 0.5 --update-freq 1 --batchsize 64 -tblog True --text-truncate 512 --truncate 512 --label-truncate 128 --model-file path_to_model

Where -t custom_task is your own dataset.

The BB2 model is similarly trained, just with --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent. You can refer to the BB2 3B model card to see the hyperparameters for the 3B version; the 400m version will just need to include BART's hyperparams instead

RyanYip-Kat commented 3 years ago

OK!Thank your response! And I want to ask you another question.If I had download other bert-base model ,and the model file are pytorch_model.bin, bert_config.json, vocab.txt,and how can transform them into format like model,model.dict,model.dict.codecs,model.dict.opt,model.opt?

klshuster commented 3 years ago

We don't yet support an easy way of converting these... the following links may be useful resources, however:

  1. Conversion utils for saving BERT models from huggingface: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/rag/conversion_utils.py
  2. Our own bert ranker code (using an old checkout of huggingface transformers): https://github.com/facebookresearch/ParlAI/tree/master/parlai/agents/bert_ranker
jytime commented 3 years ago

Hi @klshuster,

I was wondering which commands should we use to reproduce your results of BB2? I have checked the model card of BB2 3B. It shows task is wizard_of_internet. However, I also noticed the argument multitask_weights: [3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]. It seems there are multiple tasks at the same time. I am a bit confused here.

In other words, would the command below correctly train a BB2 model on wizard_of_internet? And it would be much appreciated if you could provide an example.

python -m parlai.scripts.train_model --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent -t wizard_of_internet -bs 16 -lr 1e-5 --other-args ...
klshuster commented 3 years ago

Hi @jytime, BB2 was trained on wizard_of_internet, msc (at the time of training, the three additional sessions were separate tasks, but they are now rolled into one via just -t msc --include-last-session true), and a safety dialogue dataset where classified unsafe responses had __POTENTIALLY_UNSAFE__ appended to the target (see the safety section of the BB2 README for more info)

The command you provide would train the model only on wizard_of_internet. An example training command would be similar to one found in e.g. the hallucination project (see here for an example of a FiD-RAG model), where you instead specify --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent, and --rag-retriever-type search_engine (if using a search engine). the model card provides various other hyperparameter choices

jytime commented 3 years ago

Hi @klshuster,

Thanks for your quick response. The command makes sense to me now, but I am still a bit confused about tasks. Hope I did not lose something.

According to your description and the project page, the BB2 project used BST, Multi-Session Chat, and Wizard of the Internet for training (also BAD dataset for safety). Was the training conducted in sequence, i.e., first trained on BST, then MSC, and finally Wizard of the Internet? Or was BB2 trained on them at the same time, while if so how to specify this setting?

From the description of the msc project, I guess BB2 was trained in sequence, but not sure about it. Thanks for any clarification.

klshuster commented 3 years ago

For the release BB2 models, we trained in a two-stage sequence --> we initialized the pre-trained weights from BB1, which was trained on the BST tasks, and then trained on the MSC/WizInt/Safety tasks multitasked. One could, in theory, train on all of BST/MSC/WizInt at the same time, but we leveraged the model trained already on BST

jytime commented 3 years ago

Thanks @klshuster ! So just to make sure I understand it correctly, has the command -t msc --include-last-session true enabled MSC and WizInt tasks multitasked, or do we need to write a multitasking agent for it by ourselves?

klshuster commented 3 years ago

you can specify multiple tasks via comma-separation:

--task wizard_of_internet,msc --include-last-session true works fine (no additional agent required)

github-actions[bot] commented 2 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

daje0601 commented 2 years ago

hi there - the query generator model for BB2 is just a regular BART model trained on the query generation task for Wizard of Internet (and multitasked with the MSC tasks to predict when to access memory); here's an example invocation for training a BART model with your own dataset:

python -m parlai.scripts.train_model -t custom_task --fp16 True --fp16-impl mem_efficient --gradient-clip 0.1 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 --model-parallel True --model bart -o arch/bart_large --save-after-valid True --skip-generation True --optimizer adam -lr 7e-06 --warmup-updates 100 -vmm min -vmt ppl -vp 5 -veps 0.5 --update-freq 1 --batchsize 64 -tblog True --text-truncate 512 --truncate 512 --label-truncate 128 --model-file path_to_model

Where -t custom_task is your own dataset.

The BB2 model is similarly trained, just with --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent. You can refer to the BB2 3B model card to see the hyperparameters for the 3B version; the 400m version will just need to include BART's hyperparams instead

Hello, I have a question about the script code you wrote down.

what different `--model`, `-o arch/bart_large`, `--model-file`? As I looked at the parlai.scripts.train_model code, I guess that `--model` is used when using a model in zoo and `--model-file` is used when I want to use the model I made but I don't konw `-o arch/bart_large` and `init-model`... Please help me... I really really want to understand Blenderbot2.0!
klshuster commented 2 years ago

Hi there. Our command-line args can be a little confusing; here's the gist:

daje0601 commented 2 years ago

Hi there. Our command-line args can be a little confusing; here's the gist:

  • --model: This specifies the specific agent or model-type that will be handling the incoming inputs. So, e.g., --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent means we're using BB2 (with FiD architecture); --model transformer/generator means we're using a normal encoder/decoder transformer model; --model bart means we're using BART
  • -o/--opt-preset: This specifies that we should load a predefined set of hyperparameters. It cleans up the overall ParlAI command by relegating several parameters to a .opt file. You can either specify (1) a full path to a .opt file, or (2) a relative path to a defined .opt file in our opt_presets folder. So, -o arch/bart_large means, when loading the model, please use the hyperparameters specified in this file
  • --init-model: This specifies the path to a set of model weights we want to initialize our model with. So, for example, if you wanted to fine-tune the zoo 400M BB2 model on a different task, you'd specify --init-model zoo:blenderbot2/blenderbot2_400M/model.
  • --model-file: This specifies the path to where we save our trained model. This is the file you'll ultimately use to load your model for evaluation or other inference

Really Really Thank you so much..! I got it!!

daje0601 commented 2 years ago

hi there - the query generator model for BB2 is just a regular BART model trained on the query generation task for Wizard of Internet (and multitasked with the MSC tasks to predict when to access memory); here's an example invocation for training a BART model with your own dataset:

python -m parlai.scripts.train_model -t custom_task --fp16 True --fp16-impl mem_efficient --gradient-clip 0.1 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 --model-parallel True --model bart -o arch/bart_large --save-after-valid True --skip-generation True --optimizer adam -lr 7e-06 --warmup-updates 100 -vmm min -vmt ppl -vp 5 -veps 0.5 --update-freq 1 --batchsize 64 -tblog True --text-truncate 512 --truncate 512 --label-truncate 128 --model-file path_to_model

Where -t custom_task is your own dataset.

The BB2 model is similarly trained, just with --model projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent. You can refer to the BB2 3B model card to see the hyperparameters for the 3B version; the 400m version will just need to include BART's hyperparams instead

Hello, @klshuster First of all, I was very happy that I could understand and use parlai.

I'm trying to reproduct a query generator by looking at the code you provided.

!python -m parlai.scripts.train_model -t wizard_of_internet \
 --fp16 True --fp16-impl mem_efficient --gradient-clip 0.1 \
 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 3 \
 --model-parallel True --model bart \
 --activation gelu --attention-dropout 0.0 --dict-file zoo:bart/bart_large/model.dict \
 --dict-tokenizer gpt2 --dropout 0.1 --embedding-size 1024 --embeddings-scale False \
 --ffn-size 4096 --force-fp16-tokens True --fp16 True --init-model zoo:bart/bart_large/model \
 --learn-positional-embeddings True --model bart --n-decoder-layers 12 --n-encoder-layers 12 \
 --n-heads 16 --n-positions 1024 --variant bart \
 --save-after-valid True \
 --skip-generation True --optimizer adam -lr 7e-06 --warmup-updates 100 -vmm min \
 -vmt ppl -vp 5 -veps 0.5 --update-freq 1 --batchsize 8  --eval-batchsize 8 -tblog True \
 --text-truncate 512 --truncate 512 --label-truncate 128 --model-file content/model --save_every_n_secs 60  \
 --load-from-checkpoint True -wblog True --wandb-log True --wandb-name query_generator \
 --wandb-project BB2 

I wrote the code as above, and used the wizard of internet dataset. Due to memory issues, batch_size has been changed from 64 -> 8 in Colab.

!parlai eval_model -mf content/model -t wizard_of_internet -dt valid -bs 8

The test was conducted with the code above, and the results of loss(2.875) and ppl(17.73) were obtained.

However, the Issue is that it does not act as a query generator. If you look at the picture below, you can see that the chatbot is having a conversation. image

I'm not sure what else I need to modify to make it work like the query_generator on this page and this code(parlai interactive -mf zoo:blenderbot2/query_generator/model)

Please help me in which direction to go.

klshuster commented 2 years ago

The model was trained with -t wizard_of_internet:SearchQueryTeacher; you trained it instead on the wizard of internet task