facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.47k stars 2.09k forks source link

self_feeding pretrained model is failing to run. Mismatch in vocab and embedding dimensions #2253

Closed coderkhaleesi closed 4 years ago

coderkhaleesi commented 4 years ago

Bug description I was trying to run the commands given below for the self-feeding pre-trained model. https://github.com/facebookresearch/ParlAI/tree/master/projects/self_feeding/

Reproduction steps Run python3 projects/self_feeding/interactive.py --model-file zoo:self_feeding/hh131k_hb60k_fb60k_st1k/model --no-cuda --request-feedback true.

Expected behavior The command should run as it is without any errors.

Logs

ubuntu@td-parlai-large:~/ParlAI$ python3 projects/self_feeding/interactive.py --model-file zoo:self_feeding/hh131k_hb60k_fb60k_st1k/model --no-cuda --request-feedback true
/home/ubuntu/ParlAI/parlai/agents/transformer/modules.py:32: UserWarning: Installing APEX can give a significant speed boost.
  warn_once("Installing APEX can give a significant speed boost.")
[ warning: overriding opt['subtasks'] to ['dialog', 'satisfaction'] (previously: ['dialog', 'feedback', 'satisfaction'] )]
[ warning: overriding opt['interactive'] to True (previously: False )]
[ warning: overriding opt['interactive_task'] to True (previously: None )]
[ warning: overriding opt['prev_response_filter'] to True (previously: False )]
[ warning: overriding opt['partial_load'] to True (previously: False )]
[ warning: overriding opt['eval_candidates'] to fixed (previously: inline )]
[ warning: overriding opt['encode_candidate_vecs'] to True (previously: False )]
[ warning: overriding opt['fixed_candidates_path'] to /home/ubuntu/ParlAI/data/self_feeding/convai2_cands.txt (previously: None )]
[ warning: overriding opt['no_cuda'] to True (previously: False )]
[ warning: overriding opt['request_feedback'] to True (previously: False )]
/home/ubuntu/ParlAI/projects/self_feeding/self_feeding_agent.py:293: UserWarning: Old model: overriding `add_double_person_tokens` to True.
  warn_once('Old model: overriding `add_double_person_tokens` to True.')
[ Setting interactive mode defaults... ]
Dictionary: loading dictionary from /home/ubuntu/ParlAI/data/models/self_feeding/hh131k_hb60k_fb60k_st1k/model.dict
[ num words =  23617 ]
[SelfFeeding: full interactive mode on.]
  0%|                                                                                                                | 0/1967280 [00:00<?, ?it/s]Skipping token b'1999995' with 1-dimensional vector [b'300']; likely a header
 98%|██████████████████████████████████████████████████████████████████████████████████████████████▍ | 1934277/1967280 [08:57<00:07, 4152.70it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████▉| 1966965/1967280 [09:06<00:00, 3238.55it/s]Traceback (most recent call last):
  File "projects/self_feeding/interactive.py", line 90, in <module>
    interactive(parser.parse_args(print_args=False), print_parser=parser)
  File "projects/self_feeding/interactive.py", line 71, in interactive
    agent = create_agent(opt, requireModelExists=True)
  File "/home/ubuntu/ParlAI/parlai/core/agents.py", line 736, in create_agent
    model = load_agent_module(opt)
  File "/home/ubuntu/ParlAI/parlai/core/agents.py", line 601, in load_agent_module
    return model_class(new_opt)
  File "/home/ubuntu/ParlAI/projects/self_feeding/self_feeding_agent.py", line 256, in __init__
    super().__init__(opt, shared)
  File "/home/ubuntu/ParlAI/parlai/agents/transformer/transformer.py", line 187, in __init__
    super().__init__(opt, shared)
  File "/home/ubuntu/ParlAI/parlai/core/torch_ranker_agent.py", line 170, in __init__
    self.model = self.build_model()
  File "/home/ubuntu/ParlAI/projects/self_feeding/self_feeding_agent.py", line 308, in build_model
    self._copy_embeddings(embeddings.weight, self.opt['embedding_type'])
  File "/home/ubuntu/ParlAI/parlai/core/torch_agent.py", line 1198, in _copy_embeddings
    embs, name = self._get_embtype(emb_type)
  File "/home/ubuntu/ParlAI/parlai/core/torch_agent.py", line 1136, in _get_embtype
    embs = download(self.opt.get('datapath'))
  File "/home/ubuntu/ParlAI/parlai/zoo/fasttext_cc_vectors/build.py", line 21, in download
    cache=modelzoo_path(datapath, 'models:fasttext_cc_vectors'),
  File "/home/ubuntu/.local/lib/python3.6/site-packages/torchtext/vocab.py", line 323, in __init__
    self.cache(name, cache, url=url, max_vectors=max_vectors)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/torchtext/vocab.py", line 406, in cache
    dim))
RuntimeError: Vector for token b'Champia' has 85 dimensions, but previously read vectors have 300 dimensions. All vectors must have the same number of dimensions.
100%|███████████████████████████████████████████████████████████████████████████████████████████████▉| 1966965/1967280 [09:07<00:00, 3595.50it/s]
stephenroller commented 4 years ago

It looks like your word vectors have failed to download correctly. Run rm -rf data/models/*_vectors and try it again.

coderkhaleesi commented 4 years ago

Thanks @stephenroller . It worked.