facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.1k forks source link

Can't run Demo for LIGHT #4131

Closed xxbidiao closed 2 years ago

xxbidiao commented 2 years ago

Bug description parlai eval_model -t light_dialog -mf models:light/biranker_dialogue/model raises a StopIteration exception and fails.

Reproduction steps parlai eval_model -t light_dialog -mf models:light/biranker_dialogue/model

Expected behavior I expect something to happen instead of the Demo failing with an exception.

Logs

~/ParlAI$ parlai eval_model -t light_dialog -mf models:light/biranker_dialogue/model
16:41:21 | Using CUDA
16:41:21 | loading dictionary from /home/zhiyu/ParlAI/data/models/light/biranker_dialogue/model.dict
16:41:21 | num words = 33796
16:41:21 | WARNING: BERT uses a Hugging Face tokenizer; ParlAI dictionary args are ignored
16:41:30 | Total parameters: 220,145,664 (220,145,664 trainable)
16:41:30 | Loading existing model parameters from /home/zhiyu/ParlAI/data/models/light/biranker_dialogue/model
16:41:34 | WARNING: not loading optim state since model params changed.
16:41:34 | Optimizer was reset. Also resetting LR scheduler.
/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
16:41:34 | Opt:
16:41:34 |     adafactor_eps: '(1e-30, 0.001)'
16:41:34 |     adam_eps: 1e-08
16:41:34 |     add_p1_after_newln: False
16:41:34 |     add_transformer_layer: False
16:41:34 |     aggregate_micro: False
16:41:34 |     allow_missing_init_opts: False
16:41:34 |     area_under_curve_class: None
16:41:34 |     area_under_curve_digits: -1
16:41:34 |     batch_length_range: 5
16:41:34 |     batch_sort: False
16:41:34 |     batch_sort_cache_type: pop
16:41:34 |     batch_sort_field: text
16:41:34 |     batchsize: 32
16:41:34 |     bert_aggregation: first
16:41:34 |     bert_vocabulary_path: data/models/bert/bert-base-uncased-vocab.txt
16:41:34 |     betas: '[0.9, 0.999]'
16:41:34 |     bpe_add_prefix_space: None
16:41:34 |     bpe_debug: False
16:41:34 |     bpe_dropout: None
16:41:34 |     bpe_merge: None
16:41:34 |     bpe_vocab: None
16:41:34 |     candidates: batch
16:41:34 |     cap_num_predictions: 100
16:41:34 |     context_length: -1
16:41:34 |     data_parallel: True
16:41:34 |     datapath: /home/zhiyu/ParlAI/data
16:41:34 |     datatype: train
16:41:34 |     delimiter: '\n'
16:41:34 |     dict_build_first: True
16:41:34 |     dict_class: parlai.agents.bert_ranker.bert_dictionary:BertDictionaryAgent
16:41:34 |     dict_endtoken: __end__
16:41:34 |     dict_file: /home/zhiyu/ParlAI/data/models/light/biranker_dialogue/model.dict
16:41:34 |     dict_include_test: False
16:41:34 |     dict_include_valid: False
16:41:34 |     dict_initpath: None
16:41:34 |     dict_language: english
16:41:34 |     dict_loaded: True
16:41:34 |     dict_lower: False
16:41:34 |     dict_max_ngram_size: -1
16:41:34 |     dict_maxexs: -1
16:41:34 |     dict_maxtokens: -1
16:41:34 |     dict_minfreq: 0
16:41:34 |     dict_nulltoken: __null__
16:41:34 |     dict_starttoken: __start__
16:41:34 |     dict_textfields: text,labels
16:41:34 |     dict_tokenizer: re
16:41:34 |     dict_unktoken: __unk__
16:41:34 |     display_examples: False
16:41:34 |     download_path: None
16:41:34 |     dynamic_batching: None
16:41:34 |     embedding_projection: random
16:41:34 |     embedding_type: random
16:41:34 |     encode_candidate_vecs: True
16:41:34 |     encode_candidate_vecs_batchsize: 256
16:41:34 |     eval_batchsize: 8
16:41:34 |     eval_candidates: inline
16:41:34 |     evaltask: None
16:41:34 |     fixed_candidate_vecs: reuse
16:41:34 |     fixed_candidates_path: None
16:41:34 |     force_fp16_tokens: False
16:41:34 |     fp16: False
16:41:34 |     fp16_impl: safe
16:41:34 |     gpu: -1
16:41:34 |     gradient_clip: 0.1
16:41:34 |     hide_labels: False
16:41:34 |     history_add_global_end_token: None
16:41:34 |     history_reversed: False
16:41:34 |     history_size: 5
16:41:34 |     ignore_bad_candidates: False
16:41:34 |     image_cropsize: 224
16:41:34 |     image_mode: raw
16:41:34 |     image_size: 256
16:41:34 |     include_labels: True
16:41:34 |     inference: max
16:41:34 |     init_model: None
16:41:34 |     init_opt: None
16:41:34 |     interactive_candidates: fixed
16:41:34 |     interactive_mode: False
16:41:34 |     invsqrt_lr_decay_gamma: -1
16:41:34 |     is_debug: False
16:41:34 |     label_truncate: 300
16:41:34 |     learningrate: 5e-05
16:41:34 |     light_label_type: speech
16:41:34 |     light_percent_train_exs: 1.0
16:41:34 |     light_speech_prefix: True
16:41:34 |     light_unseen_test: False
16:41:34 |     light_use_action: all
16:41:34 |     light_use_affordances: True
16:41:34 |     light_use_cands: 20
16:41:34 |     light_use_clip_cands: 10000
16:41:34 |     light_use_current_self_output: all
16:41:34 |     light_use_emote: all
16:41:34 |     light_use_objects: True
16:41:34 |     light_use_person_names: True
16:41:34 |     light_use_persona: self
16:41:34 |     light_use_repeat: none
16:41:34 |     light_use_setting: True
16:41:34 |     light_use_speech: all
16:41:34 |     light_use_taskname: True
16:41:34 |     log_every_n_secs: 10.0
16:41:34 |     log_keep_fields: all
16:41:34 |     loglevel: info
16:41:34 |     lr_scheduler: fixed
16:41:34 |     lr_scheduler_decay: 0.35
16:41:34 |     lr_scheduler_patience: 1
16:41:34 |     max_train_time: 86400.0
16:41:34 |     metrics: default
16:41:34 |     model: bert_ranker/bi_encoder_ranker
16:41:34 |     model_file: /home/zhiyu/ParlAI/data/models/light/biranker_dialogue/model
16:41:34 |     momentum: 0
16:41:34 |     multitask_weights: [1]
16:41:34 |     mutators: None
16:41:34 |     nesterov: True
16:41:34 |     no_cuda: False
16:41:34 |     num_epochs: 1000.0
16:41:34 |     num_examples: -1
16:41:34 |     numthreads: 1
16:41:34 |     numworkers: 4
16:41:34 |     nus: [0.7]
16:41:34 |     optimizer: sgd
16:41:34 |     out_dim: 768
16:41:34 |     override: {}
16:41:34 |     parlai_home: /private/home/jase/src/ParlAI
16:41:34 |     person_tokens: False
16:41:34 |     pretrained_bert_path: data/models/bert/bert-base-uncased.tar.gz
16:41:34 |     pretrained_path: /home/zhiyu/ParlAI/data/models/bert_models/bert-base-uncased.tar.gz
16:41:34 |     pull_from_layer: -1
16:41:34 |     pytorch_context_length: -1
16:41:34 |     pytorch_datapath: None
16:41:34 |     pytorch_include_labels: True
16:41:34 |     pytorch_preprocess: False
16:41:34 |     pytorch_teacher_batch_sort: False
16:41:34 |     pytorch_teacher_dataset: None
16:41:34 |     pytorch_teacher_task: None
16:41:34 |     rank_candidates: True
16:41:34 |     rank_top_k: -1
16:41:34 |     repeat_blocking_heuristic: True
16:41:34 |     report_filename: 
16:41:34 |     return_cand_scores: False
16:41:34 |     save_after_valid: True
16:41:34 |     save_every_n_secs: -1
16:41:34 |     save_format: conversations
16:41:34 |     show_advanced_args: False
16:41:34 |     shuffle: False
16:41:34 |     special_tok_lst: None
16:41:34 |     split_lines: False
16:41:34 |     starttime: Feb26_09-22
16:41:34 |     task: internal:light_dialog:light_label_type=speech
16:41:34 |     tensorboard_comment: 
16:41:34 |     tensorboard_log: False
16:41:34 |     tensorboard_logdir: None
16:41:34 |     tensorboard_metrics: None
16:41:34 |     tensorboard_tag: None
16:41:34 |     text_truncate: 300
16:41:34 |     topk: 5
16:41:34 |     topn: 10
16:41:34 |     train_predict: False
16:41:34 |     truncate: -1
16:41:34 |     type_optimization: all_encoder_layers
16:41:34 |     update_freq: -1
16:41:34 |     use_reply: label
16:41:34 |     validation_cutoff: 1.0
16:41:34 |     validation_every_n_epochs: -1
16:41:34 |     validation_every_n_secs: 1000.0
16:41:34 |     validation_max_exs: 10000
16:41:34 |     validation_metric: accuracy
16:41:34 |     validation_metric_mode: max
16:41:34 |     validation_patience: 15
16:41:34 |     validation_share_agent: False
16:41:34 |     verbose: False
16:41:34 |     warmup_rate: 0.0001
16:41:34 |     warmup_updates: 200
16:41:34 |     weight_decay: None
16:41:34 |     world_logs: 
16:41:34 | Current ParlAI commit: 8094996b09c49f436255d0232a13d3f7201c36e2
16:41:34 | Current internal commit: 8094996b09c49f436255d0232a13d3f7201c36e2
16:41:34 | Current fb commit: 8094996b09c49f436255d0232a13d3f7201c36e2
16:41:34 | Evaluating task light_dialog using datatype valid.
16:41:34 | creating task(s): light_dialog
16:41:34 | Loading ParlAI text data: /home/zhiyu/ParlAI/data/light_dialogue/tasknameTrue_settingTrue_objectsTrue_person_namesTrue_personaself_emoteall_speechall_actionall_affordancesTrue_repeatnone_cands20_current_self_outputall_clip_cands10000_speech_prefixTrue/speech_valid.txt
16:41:34 | [ Executing eval mode with provided inline set of candidates ]
Traceback (most recent call last):
  File "/home/zhiyu/anaconda3/envs/evennia/bin/parlai", line 33, in <module>
    sys.exit(load_entry_point('parlai', 'console_scripts', 'parlai')())
  File "/home/zhiyu/ParlAI/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/home/zhiyu/ParlAI/parlai/core/script.py", line 325, in superscript_main
    return SCRIPT_REGISTRY[cmd].klass._run_from_parser_and_opt(opt, parser)
  File "/home/zhiyu/ParlAI/parlai/core/script.py", line 108, in _run_from_parser_and_opt
    return script.run()
  File "/home/zhiyu/ParlAI/parlai/scripts/eval_model.py", line 264, in run
    return eval_model(self.opt)
  File "/home/zhiyu/ParlAI/parlai/scripts/eval_model.py", line 239, in eval_model
    task_report = _eval_single_world(opt, agent, task)
  File "/home/zhiyu/ParlAI/parlai/scripts/eval_model.py", line 178, in _eval_single_world
    world.parley()
  File "/home/zhiyu/ParlAI/parlai/core/worlds.py", line 370, in parley
    acts[1] = agents[1].act()
  File "/home/zhiyu/ParlAI/parlai/core/torch_agent.py", line 2143, in act
    response = self.batch_act([self.observation])[0]
  File "/home/zhiyu/ParlAI/parlai/core/torch_agent.py", line 2239, in batch_act
    output = self.eval_step(batch)
  File "/home/zhiyu/ParlAI/parlai/core/torch_ranker_agent.py", line 522, in eval_step
    scores = self.score_candidates(batch, cand_vecs, cand_encs=cand_encs)
  File "/home/zhiyu/ParlAI/parlai/agents/bert_ranker/bi_encoder_ranker.py", line 190, in score_candidates
    _, embedding_cands = self.model(
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/_utils.py", line 434, in reraise
    raise exception
StopIteration: Caught StopIteration in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhiyu/ParlAI/parlai/agents/bert_ranker/bi_encoder_ranker.py", line 256, in forward
    embedding_cands = self.cand_encoder(
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhiyu/ParlAI/parlai/agents/bert_ranker/helpers.py", line 133, in forward
    output_bert, output_pooler = self.bert_model(
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/torch-1.10.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhiyu/anaconda3/envs/evennia/lib/python3.9/site-packages/pytorch_pretrained_bert/modeling.py", line 727, in forward
    extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration

Additional context I'm looking into how to use LIGHT in my research and are not sure how I can use it. The Demo will help me on understanding better how LIGHT will build a text adventure game world.

ivnle commented 2 years ago

I've run into the same error using the same steps as @xxbidiao .

github-actions[bot] commented 2 years ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

xxbidiao commented 2 years ago

Wondering if there is any update on this.

klshuster commented 2 years ago

Hi, what version of pytorch_pretrained_bert do you have installed? This command works for me as is (with version 0.6.2)