facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.09k forks source link

Error building MSMarco dataset #1349

Closed isaacmg closed 5 years ago

isaacmg commented 5 years ago

Hi I'm getting an error when trying to build the ms_marco dataset. Experienced the problem on both OS X and Ubuntu. Command run

python examples/train_model.py -m "drqa" -t "ms_marco"

Error message

Traceback (most recent call last):
  File "/content/ParlAI/parlai/core/agents.py", line 631, in _create_task_agents
    create_agent = getattr(my_module, 'create_agents')
AttributeError: module 'parlai.tasks.ms_marco.agents' has no attribute 'create_agents'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/train_model.py", line 18, in <module>
    TrainLoop(opt).train()
  File "/content/ParlAI/parlai/scripts/train_model.py", line 185, in __init__
    self.world = create_task(opt, self.agent)
  File "/content/ParlAI/parlai/core/worlds.py", line 1003, in create_task
    world = create_task_world(opt, user_agents, default_world=default_world)
  File "/content/ParlAI/parlai/core/worlds.py", line 973, in create_task_world
    opt, user_agents, default_world=default_world)
  File "/content/ParlAI/parlai/core/worlds.py", line 938, in _get_task_world
    task_agents = _create_task_agents(opt)
  File "/content/ParlAI/parlai/core/agents.py", line 635, in _create_task_agents
    return create_task_agent_from_taskname(opt)
  File "/content/ParlAI/parlai/core/agents.py", line 589, in create_task_agent_from_taskname
    task_agents = teacher_class(opt)
  File "/content/ParlAI/parlai/tasks/ms_marco/agents.py", line 40, in __init__
    opt['datafile'] = _path(opt, is_passage=False)
  File "/content/ParlAI/parlai/tasks/ms_marco/agents.py", line 18, in _path
    build(opt)
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 81, in build
    create_fb_format(dpath, "train", os.path.join(dpath, 'train.gz'))
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 27, in create_fb_format
    lines = read_gz(inpath)
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 18, in read_gz
    lines = [x.decode('utf-8') for x in f.readlines()]
  File "/usr/lib/python3.6/gzip.py", line 374, in readline
    return self._buffer.readline(size)
  File "/usr/lib/python3.6/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.6/gzip.py", line 411, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'\xef\xbb')
stephenroller commented 5 years ago

Looks like they bumped the version of ms_marco from 1.1 to 2.1. The urls in parlai/tasks/ms_marco/build.py need to be updated.

isaacmg commented 5 years ago

Okay I fixed the urls to 2.1. However, now I'm getting this error. I'm guessing that there is some bad line in the new dataset that is throwing since the code seems to run for awhile.

Traceback (most recent call last):
  File "examples/train_model.py", line 18, in <module>
    TrainLoop(opt).train()
  File "/content/ParlAI/parlai/scripts/train_model.py", line 185, in __init__
    self.world = create_task(opt, self.agent)
  File "/content/ParlAI/parlai/core/worlds.py", line 1003, in create_task
    world = create_task_world(opt, user_agents, default_world=default_world)
  File "/content/ParlAI/parlai/core/worlds.py", line 973, in create_task_world
    opt, user_agents, default_world=default_world)
  File "/content/ParlAI/parlai/core/worlds.py", line 938, in _get_task_world
    task_agents = _create_task_agents(opt)
  File "/content/ParlAI/parlai/core/agents.py", line 635, in _create_task_agents
    return create_task_agent_from_taskname(opt)
  File "/content/ParlAI/parlai/core/agents.py", line 589, in create_task_agent_from_taskname
    task_agents = teacher_class(opt)
  File "/content/ParlAI/parlai/tasks/ms_marco/agents.py", line 40, in __init__
    opt['datafile'] = _path(opt, is_passage=False)
  File "/content/ParlAI/parlai/tasks/ms_marco/agents.py", line 18, in _path
    build(opt)
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 81, in build
    create_fb_format(dpath, "train", os.path.join(dpath, 'train.gz'))
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 42, in create_fb_format
    d["passage_text"] for d in dic["passages"] if d["is_selected"] == 1
  File "/content/ParlAI/parlai/tasks/ms_marco/build.py", line 42, in <listcomp>
    d["passage_text"] for d in dic["passages"] if d["is_selected"] == 1
TypeError: string indices must be integers
stephenroller commented 5 years ago

Fixed by #1395