facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.5k stars 2.09k forks source link

FastText embeddings link forbidden (HTTP Error 403) #1587

Closed JohannesTK closed 5 years ago

JohannesTK commented 5 years ago

Trying to evaluate the pretrained End-to-end generative models of Wizard of Wikipedia fails because the FastText word vectors link has become unavailable.

Exact stack trace:

python examples/eval_model.py \
>     -bs 64 -t wizard_of_wikipedia:generator:topic_split \
>     -mf models:wizard_of_wikipedia/end2end_generator/model
/home/ubuntu/ParlAI/parlai/agents/transformer/transformer.py:18: UserWarning: Public release transformer models are currently in beta. The name of command line options may change or disappear before a stable release. We welcome your feedback. Please file feedback as issues at https://github.com/facebookresearch/ParlAI/issues/new
  "Public release transformer models are currently in beta. The name of "
[ warning: overriding opt['task'] to wizard_of_wikipedia:generator:topic_split (previously: wizard_of_wikipedia:generator:random_split )]
[ warning: overriding opt['model_file'] to /home/ubuntu/ParlAI/data/models/wizard_of_wikipedia/end2end_generator/model (previously: /tmp/wizard_endtoend_model )]
Dictionary: loading dictionary from /home/ubuntu/ParlAI/data/models/wizard_of_wikipedia/end2end_generator/model.dict
[ num words =  34883 ]
[ Using CUDA ]
/private/home/roller/working/parlai/data/models/fasttext_vectors/wiki.en.vec: 0.00B [00:00, ?B/s]
Traceback (most recent call last):
  File "examples/eval_model.py", line 17, in <module>
    eval_model(opt, print_parser=parser)
  File "/home/ubuntu/ParlAI/parlai/scripts/eval_model.py", line 68, in eval_model
    agent = create_agent(opt, requireModelExists=True)
  File "/home/ubuntu/ParlAI/parlai/core/agents.py", line 552, in create_agent
    model = load_agent_module(opt)
  File "/home/ubuntu/ParlAI/parlai/core/agents.py", line 429, in load_agent_module
    return model_class(new_opt)
  File "/home/ubuntu/ParlAI/projects/wizard_of_wikipedia/generator/agents.py", line 109, in __init__
    super().__init__(opt, shared)
  File "/home/ubuntu/ParlAI/parlai/core/torch_generator_agent.py", line 356, in __init__
    self.build_model()
  File "/home/ubuntu/ParlAI/projects/wizard_of_wikipedia/generator/agents.py", line 315, in build_model
    self.model.embeddings.weight, self.opt['embedding_type']
  File "/home/ubuntu/ParlAI/parlai/core/torch_agent.py", line 890, in _copy_embeddings
    embs, name = self._get_embtype(emb_type)
  File "/home/ubuntu/ParlAI/parlai/core/torch_agent.py", line 832, in _get_embtype
    'models:fasttext_vectors'))
  File "/home/ubuntu/ParlAI/venv/lib/python3.6/site-packages/torchtext/vocab.py", line 411, in __init__
    super(FastText, self).__init__(name, url=url, **kwargs)
  File "/home/ubuntu/ParlAI/venv/lib/python3.6/site-packages/torchtext/vocab.py", line 280, in __init__
    self.cache(name, cache, url=url, max_vectors=max_vectors)
  File "/home/ubuntu/ParlAI/venv/lib/python3.6/site-packages/torchtext/vocab.py", line 313, in cache
    urlretrieve(url, dest, reporthook=reporthook(t))
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The torchtext function tries to download it from the AWS bucket. The word vectors are also available at fbaipublicfiles but the download speed is extremely slow when downloading to AWS US instance (~100KB/s). The torchtext repo doesn't seem to be maintained at a high clip so I opened the issue here (you reply fast:).

stephenroller commented 5 years ago

Ahh yes. Please file an issue there too.

In the meantime, you can work around this by adding -emb random to that command. You’ll still be able to load the pretrained model.

stephenroller commented 5 years ago

Although I’m very confused as to why the fbaipublicfiles would be slow to download to AWS instances. It’s backed by AWS

JohannesTK commented 5 years ago

Thanks for the fast reply. PR has been submitted to torchtext.

jazzminewang commented 5 years ago

I have the fix PR in my git logs, but still getting the same error.

stephenroller commented 5 years ago

Hi @jazzminewang can you send the command and logs?

jazzminewang commented 5 years ago
/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/agents/transformer/transformer.py:18: UserWarning: Public release transformer models are currently in beta. The name of command line options may change or disappear before a stable release. We welcome your feedback. Please file feedback as issues at https://github.com/facebookresearch/ParlAI/issues/new
  "Public release transformer models are currently in beta. The name of "
[ warning: overriding opt['task'] to wizard_of_wikipedia:generator:topic_split (previously: wizard_of_wikipedia:generator:random_split )]
[ warning: overriding opt['model_file'] to /Users/jasminewang/Development/RLLab/mturk_dialog_eval/data/models/wizard_of_wikipedia/end2end_generator/model (previously: /tmp/wizard_endtoend_model )]
Dictionary: loading dictionary from /Users/jasminewang/Development/RLLab/mturk_dialog_eval/data/models/wizard_of_wikipedia/end2end_generator/model.dict
[ num words =  34883 ]
/private/home/roller/working/parlai/data/models/fasttext_vectors/wiki.en.vec: 0.00B [00:00, ?B/s]
Traceback (most recent call last):
  File "examples/eval_model.py", line 17, in <module>
    eval_model(opt, print_parser=parser)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/scripts/eval_model.py", line 68, in eval_model
    agent = create_agent(opt, requireModelExists=True)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/agents.py", line 552, in create_agent
    model = load_agent_module(opt)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/agents.py", line 429, in load_agent_module
    return model_class(new_opt)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/projects/wizard_of_wikipedia/generator/agents.py", line 99, in __init__
    super().__init__(opt, shared)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_generator_agent.py", line 358, in __init__
    self.build_model()
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/projects/wizard_of_wikipedia/generator/agents.py", line 305, in build_model
    self.model.embeddings.weight, self.opt['embedding_type']
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_agent.py", line 917, in _copy_embeddings
    embs, name = self._get_embtype(emb_type)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_agent.py", line 859, in _get_embtype
    'models:fasttext_vectors'))
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 411, in __init__
    super(FastText, self).__init__(name, url=url, **kwargs)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 280, in __init__
    self.cache(name, cache, url=url, max_vectors=max_vectors)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 313, in cache
    urlretrieve(url, dest, reporthook=reporthook(t))
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
jazzminewang commented 5 years ago

Using sudo because without it I was getting this error:

Traceback (most recent call last):
  File "examples/eval_model.py", line 17, in <module>
    eval_model(opt, print_parser=parser)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/scripts/eval_model.py", line 68, in eval_model
    agent = create_agent(opt, requireModelExists=True)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/agents.py", line 552, in create_agent
    model = load_agent_module(opt)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/agents.py", line 429, in load_agent_module
    return model_class(new_opt)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/projects/wizard_of_wikipedia/generator/agents.py", line 99, in __init__
    super().__init__(opt, shared)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_generator_agent.py", line 358, in __init__
    self.build_model()
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/projects/wizard_of_wikipedia/generator/agents.py", line 305, in build_model
    self.model.embeddings.weight, self.opt['embedding_type']
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_agent.py", line 917, in _copy_embeddings
    embs, name = self._get_embtype(emb_type)
  File "/Users/jasminewang/Development/RLLab/mturk_dialog_eval/parlai/core/torch_agent.py", line 859, in _get_embtype
    'models:fasttext_vectors'))
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 411, in __init__
    super(FastText, self).__init__(name, url=url, **kwargs)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 280, in __init__
    self.cache(name, cache, url=url, max_vectors=max_vectors)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/site-packages/torchtext/vocab.py", line 308, in cache
    os.makedirs(cache)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 3 more times]
  File "/Users/jasminewang/anaconda3/envs/ParlAI/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/private/home'
stephenroller commented 5 years ago

@jazzminewang are you running the command from the readme exactly?

cc @emilydinan looks like we're still having opt issues :(

jazzminewang commented 5 years ago

Apologies, thought I pasted in the command also: sudo python examples/eval_model.py -bs 64 -t wizard_of_wikipedia:generator:topic_split -mf models:wizard_of_wikipedia/end2end_generator/model

stephenroller commented 5 years ago

@emilydinan looks like this one is mine.

emilydinan commented 5 years ago

@stephenroller looks like the problem is with the datapath not being overridden. i've run into this issue as well:

[  datapath: /private/home/roller/working/parlai/data ]
[  download_path: /private/home/roller/working/parlai/downloads ]

the best solution might be to make sure datapath is never overridden when we load opts from file