Replicating BlenderBot 90M model

passing2961 commented 3 years ago

Hi,

For the research purpose, I tried to replicate the result of blenderbot 90M which is reported in the original paper (i.e., "Recipes for building an open-domain chatbot"). In other words, I want to fine-tune the pre-trained model (90M) with BST, not fine-tuning the blenderbot 90M itself which is already fine-tuned on the BST task.

As following the parlai docs related to the blenderbot, I run the below code.

from parlai.scripts.train_model import setup_args, TrainLoop 

if __name__ == '__main__':
    parser = setup_args()
    parser.set_defaults(
        task='blended_skill_talk,wizard_of_wikipedia,convai2:normalized',
        multitask_weights='1,3,3,3',
        model='agents.discourse_blender:DiscourseBlenderAgent',
        model_file='./test_model/test_train_90M_bst',
        init_model='zoo:tutorial_transformer_generator/model',
        dict_file='zoo:tutorial_transformer_generator/model.dict',
        embedding_size=512,
        n_layers=8,
        ffn_size=2048,
        dropout=0.1,
        n_heads=16,
        learn_positional_embeddings=True,
        n_positions=512,
        variant='xlm',
        activation='gelu',
        text_truncate=512,
        label_truncate=128,
        dict_tokenizer='bpe',
        dict_lower=True,
        lr=1e-06,
        optimizer='adamax',
        lr_scheduler='reduceonplateau',
        gradient_clip=0.1,
        veps=0.25,
        betas=(0.9,0.999),
        update_freq=1,
        attention_dropout=0.0,
        relu_dropout=0.0,
        skip_generation=True,
        vp=15,
        stim=60,
        vme=20000,
        bs=16,
        vmt='ppl',
        vmm='min',
        save_after_valid=True,
        metrics='token_acc,ppl,loss,f1',
        fp16=False,
    )

    TrainLoop(parser.parse_args()).train()

During training, I encountered an error related to the PPL Metric.

And I found the error is attributed to the value itself is too large like 1046, as shown in the above image.

class PPLMetric(AverageMetric):
    def value(self):
        print(super().value())
        return math.exp(super().value())

Thus, I have two questions.

1) Is the code that I used to replicate the fine-tuned result of blenderbot 90M correct?

2) If it is correct, then can you tell me what you think is the cause of this error?

p.s. I tried to write this issue very politely, but please understand that I may not be able to do so because I am not used to using English :(.

Sincerely,

stephenroller commented 3 years ago

That looks right to me. Can you share the logs before that?

I'm not sure what custom logic you have in your Discourse agent though. Maybe try with a vanilla "transformer/generator"

passing2961 commented 3 years ago

Actually, my Discourse agent is currently working the same as the vanilla "transformer/generator".

As your guide, I tried with the vanilla "transformer/generator," but it made the same results as before.

And I'm not sure it is okay to upload all logs that I encountered in my terminal prompt, but I put all logs in here.

Thanks,

13:55:32 | building dictionary first...
13:55:32 | No model with opt yet at: ./test_model/test_train_90M_bst(.opt)
13:55:32 | your model is being loaded with opts that do not exist in the model you are initializing the weights with: allow_missing_init_opts: False,download_path: None,loglevel: info,dynamic_batching: None,verbose: False,is_debug: False,datapath: /workspace/Dialog/ParlAI/data,eval_dynamic_batching: None,num_workers: 0,max_train_steps: -1,log_every_n_steps: 50,validation_every_n_steps: -1,load_from_checkpoint: True,tensorboard_logdir: None,wandb_log: False,wandb_name: None,wandb_project: None,wandb_entity: None,label_type: response,include_knowledge: True,include_checked_sentence: True,include_knowledge_separator: False,chosen_topic_delimiter: 
,num_topics: 5,add_missing_turns: none,mutators: None,your_persona_first: True,max_num_turns: -1,n_encoder_layers: -1,n_decoder_layers: -1,model_parallel: False,beam_block_full_context: True,beam_length_penalty: 0.65,topk: 10,topp: 0.9,beam_delay: 30,beam_block_list_filename: None,temperature: 1.0,compute_tokenized_bleu: False,interactive_mode: False,fp16_impl: safe,force_fp16_tokens: False,adafactor_eps: (1e-30, 0.001),history_reversed: False,history_add_global_end_token: None,special_tok_lst: None,bpe_vocab: None,bpe_merge: None,bpe_add_prefix_space: None,bpe_dropout: None,invsqrt_lr_decay_gamma: -1,lr: 1e-06,veps: 0.25,vp: 15,stim: 60,vme: 20000,bs: 16,vmt: ppl,vmm: min,parlai_home: /workspace/Dialog/ParlAI
13:55:32 | your model is being loaded with opts that differ from the model you are initializing the weights with. Add the following args to your run command to change this: 
--show-advanced-args False --task internal:new_reddit:presorted --datatype train:stream --numthreads 1 --multitask-weights 1 --batchsize 48 --num-epochs 5.0 --validation-every-n-secs 1800.0 --validation-max-exs 9920 --short-final-eval True --validation-patience 0 --validation-metric ppl --validation-metric-mode min --dict-build-first True --metrics default --numworkers 4 --pytorch-preprocess False --pytorch-teacher-batch-sort False --batch-sort-cache-type pop --batch-length-range 5 --shuffle False --batch-sort-field text --pytorch-context-length -1 --pytorch-include-labels True --log-every-n-secs 30.0 --distributed-world-size 64 --port 61337 --beam-size 8 --beam-min-n-best 3 --beam-min-length 10 --skip-generation False --inference beam --fp16 True --optimizer fused_adam --learningrate 0.0005 --gradient-clip 10.0 --adam-eps 1e-06 --betas 0.9,0.98 --weight-decay 0.01 --lr-scheduler invsqrt --warmup-updates 20000 --gpu 0 --beam-block-ngram 3 --beam-context-block-ngram 3
13:55:32 | Using CUDA
13:55:32 | loading dictionary from /workspace/Dialog/ParlAI/data/models/tutorial_transformer_generator/model.dict
13:55:33 | num words = 54944
13:55:33 | DEPRECATED: XLM should only be used for backwards compatibility, as it involves a less-stable layernorm operation.
13:55:35 | Total parameters: 87,508,992 (87,508,992 trainable)
13:55:35 | Loading existing model params from /workspace/Dialog/ParlAI/data/models/tutorial_transformer_generator/model
13:55:36 | Detected a fine-tune run. Resetting the optimizer.
13:55:36 | Optimizer was reset. Also resetting LR scheduler.
13:55:36 | Opt:
13:55:36 |     activation: gelu
13:55:36 |     adafactor_eps: '(1e-30, 0.001)'
13:55:36 |     adam_eps: 1e-08
13:55:36 |     add_missing_turns: none
13:55:36 |     add_p1_after_newln: False
13:55:36 |     aggregate_micro: False
13:55:36 |     allow_missing_init_opts: False
13:55:36 |     attention_dropout: 0.0
13:55:36 |     batchsize: 1
13:55:36 |     beam_block_full_context: True
13:55:36 |     beam_block_list_filename: None
13:55:36 |     beam_block_ngram: -1
13:55:36 |     beam_context_block_ngram: -1
13:55:36 |     beam_delay: 30
13:55:36 |     beam_length_penalty: 0.65
13:55:36 |     beam_min_length: 1
13:55:36 |     beam_size: 1
13:55:36 |     betas: '(0.9, 0.999)'
13:55:36 |     bpe_add_prefix_space: None
13:55:36 |     bpe_debug: False
13:55:36 |     bpe_dropout: None
13:55:36 |     bpe_merge: None
13:55:36 |     bpe_vocab: None
13:55:36 |     bs: 16
13:55:36 |     chosen_topic_delimiter: '\n'
13:55:36 |     compute_tokenized_bleu: False
13:55:36 |     datapath: /workspace/Dialog/ParlAI/data
13:55:36 |     datatype: train
13:55:36 |     delimiter: '\n'
13:55:36 |     dict_class: parlai.core.dict:DictionaryAgent
13:55:36 |     dict_endtoken: __end__
13:55:36 |     dict_file: /workspace/Dialog/ParlAI/data/models/tutorial_transformer_generator/model.dict
13:55:36 |     dict_include_test: False
13:55:36 |     dict_include_valid: False
13:55:36 |     dict_initpath: None
13:55:36 |     dict_language: english
13:55:36 |     dict_loaded: True
13:55:36 |     dict_lower: True
13:55:36 |     dict_max_ngram_size: -1
13:55:36 |     dict_maxexs: -1
13:55:36 |     dict_maxtokens: -1
13:55:36 |     dict_minfreq: 0
13:55:36 |     dict_nulltoken: __null__
13:55:36 |     dict_starttoken: __start__
13:55:36 |     dict_textfields: text,labels
13:55:36 |     dict_tokenizer: bpe
13:55:36 |     dict_unktoken: __unk__
13:55:36 |     display_examples: False
13:55:36 |     download_path: None
13:55:36 |     dropout: 0.1
13:55:36 |     dynamic_batching: None
13:55:36 |     embedding_projection: random
13:55:36 |     embedding_size: 512
13:55:36 |     embedding_type: random
13:55:36 |     embeddings_scale: True
13:55:36 |     eval_batchsize: None
13:55:36 |     eval_dynamic_batching: None
13:55:36 |     evaltask: None
13:55:36 |     ffn_size: 2048
13:55:36 |     force_fp16_tokens: False
13:55:36 |     fp16: False
13:55:36 |     fp16_impl: safe
13:55:36 |     gpu: -1
13:55:36 |     gradient_clip: 0.1
13:55:36 |     hide_labels: False
13:55:36 |     history_add_global_end_token: None
13:55:36 |     history_reversed: False
13:55:36 |     history_size: -1
13:55:36 |     image_cropsize: 224
13:55:36 |     image_mode: raw
13:55:36 |     image_size: 256
13:55:36 |     include_checked_sentence: True
13:55:36 |     include_knowledge: True
13:55:36 |     include_knowledge_separator: False
13:55:36 |     inference: greedy
13:55:36 |     init_model: /workspace/Dialog/ParlAI/data/models/tutorial_transformer_generator/model
13:55:36 |     init_opt: None
13:55:36 |     interactive_mode: False
13:55:36 |     invsqrt_lr_decay_gamma: -1
13:55:36 |     is_debug: False
13:55:36 |     label_truncate: 128
13:55:36 |     label_type: response
13:55:36 |     learn_positional_embeddings: True
13:55:36 |     learningrate: 1
13:55:36 |     load_from_checkpoint: True
13:55:36 |     log_every_n_secs: -1
13:55:36 |     log_every_n_steps: 50
13:55:36 |     loglevel: info
13:55:36 |     lr: 1e-06
13:55:36 |     lr_scheduler: reduceonplateau
13:55:36 |     lr_scheduler_decay: 0.5
13:55:36 |     lr_scheduler_patience: 3
13:55:36 |     max_num_turns: -1
13:55:36 |     max_train_steps: -1
13:55:36 |     max_train_time: -1
13:55:36 |     metrics: token_acc,ppl,loss,f1
13:55:36 |     model: transformer/generator
13:55:36 |     model_file: ./test_model/test_train_90M_bst
13:55:36 |     model_parallel: False
13:55:36 |     momentum: 0
13:55:36 |     multitask_weights: '(1.0, 3.0, 3.0, 3.0)'
13:55:36 |     mutators: None
13:55:36 |     n_decoder_layers: -1
13:55:36 |     n_encoder_layers: -1
13:55:36 |     n_heads: 16
13:55:36 |     n_layers: 8
13:55:36 |     n_positions: 512
13:55:36 |     n_segments: 0
13:55:36 |     nesterov: True
13:55:36 |     no_cuda: False
13:55:36 |     num_epochs: -1
13:55:36 |     num_topics: 5
13:55:36 |     num_workers: 0
13:55:36 |     nus: (0.7,)
13:55:36 |     optimizer: adamax
13:55:36 |     output_scaling: 1.0
13:55:36 |     override: {}
13:55:36 |     parlai_home: /workspace/Dialog/ParlAI
13:55:36 |     person_tokens: False
13:55:36 |     rank_candidates: False
13:55:36 |     relu_dropout: 0.0
13:55:36 |     save_after_valid: True
13:55:36 |     save_every_n_secs: -1
13:55:36 |     share_word_embeddings: True
13:55:36 |     short_final_eval: False
13:55:36 |     skip_generation: True
13:55:36 |     special_tok_lst: None
13:55:36 |     split_lines: False
13:55:36 |     starttime: Jul08_13-55
13:55:36 |     stim: 60
13:55:36 |     task: blended_skill_talk,wizard_of_wikipedia,convai2:normalized
13:55:36 |     temperature: 1.0
13:55:36 |     tensorboard_log: False
13:55:36 |     tensorboard_logdir: None
13:55:36 |     text_truncate: 512
13:55:36 |     topk: 10
13:55:36 |     topp: 0.9
13:55:36 |     truncate: -1
13:55:36 |     update_freq: 1
13:55:36 |     use_reply: label
13:55:36 |     validation_cutoff: 1.0
13:55:36 |     validation_every_n_epochs: -1
13:55:36 |     validation_every_n_secs: -1
13:55:36 |     validation_every_n_steps: -1
13:55:36 |     validation_max_exs: -1
13:55:36 |     validation_metric: accuracy
13:55:36 |     validation_metric_mode: None
13:55:36 |     validation_patience: 10
13:55:36 |     validation_share_agent: False
13:55:36 |     variant: xlm
13:55:36 |     veps: 0.25
13:55:36 |     verbose: False
13:55:36 |     vme: 20000
13:55:36 |     vmm: min
13:55:36 |     vmt: ppl
13:55:36 |     vp: 15
13:55:36 |     wandb_entity: None
13:55:36 |     wandb_log: False
13:55:36 |     wandb_name: None
13:55:36 |     wandb_project: None
13:55:36 |     warmup_rate: 0.0001
13:55:36 |     warmup_updates: -1
13:55:36 |     weight_decay: None
13:55:36 |     your_persona_first: True
13:55:36 | Current ParlAI commit: 740079578c1956e8a2a0cdaeb5b53ec578243713
13:55:36 | creating task(s): blended_skill_talk,wizard_of_wikipedia,convai2:normalized
13:55:36 | Loading ParlAI text data: /workspace/Dialog/ParlAI/data/blended_skill_talk/train.txt
loading: /workspace/Dialog/ParlAI/data/wizard_of_wikipedia/train.json
13:55:46 | Some data not being used. If you are not trying to reproduce the previous results, it is recommended that you run with the flag --add-missing-turns train or --add-missing-turns all.
13:56:01 | loading normalized fbdialog data: /workspace/Dialog/ParlAI/data/ConvAI2/train_self_original.txt
13:56:01 | loading fbdialog data: /workspace/Dialog/ParlAI/data/ConvAI2/train_self_original.txt
13:56:35 | training...
13:56:35 | parlai.tasks.wizard_of_wikipedia.agents.DefaultTeacher' is outputting dicts instead of messages. If this is a teacher that is part of ParlAI, please file an issue on GitHub. If it is your own teacher, please return a Message object instead.
1355.7349649993466
Traceback (most recent call last):
  File "train_bst.py", line 50, in <module>
    TrainLoop(parser.parse_args()).train()
  File "/workspace/Dialog/ParlAI/parlai/scripts/train_model.py", line 900, in train
    for _train_log in self.train_steps():
  File "/workspace/Dialog/ParlAI/parlai/scripts/train_model.py", line 838, in train_steps
    yield self.log()
  File "/workspace/Dialog/ParlAI/parlai/scripts/train_model.py", line 755, in log
    train_report_trainstats = dict_report(train_report)
  File "/workspace/Dialog/ParlAI/parlai/core/metrics.py", line 873, in dict_report
    return {k: v.value() if isinstance(v, Metric) else v for k, v in report.items()}
  File "/workspace/Dialog/ParlAI/parlai/core/metrics.py", line 873, in <dictcomp>
    return {k: v.value() if isinstance(v, Metric) else v for k, v in report.items()}
  File "/workspace/Dialog/ParlAI/parlai/core/torch_generator_agent.py", line 318, in value
    return math.exp(super().value())
OverflowError: math range error

github-actions[bot] commented 3 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

xiaolan98 commented 2 years ago

The reason is the argument 'lr', you should change it to 'learningrate'. Because now your learning rate is set to 1, which is too large. I'm not sure from which version, 'lr' is not the short version of 'learningrate'.

facebookresearch / ParlAI

Replicating BlenderBot 90M model #3765