DreamInvoker / GAIN

Source code for EMNLP 2020 paper: Double Graph Based Reasoning for Document-level Relation Extraction
MIT License
142 stars 30 forks source link

在多GPUs上训练出错 #8

Closed yangboye closed 3 years ago

yangboye commented 3 years ago

您好,请问一下我尝试用DataParallel把这个模型改成多GPU训练,在运行的时候报错了:

Traceback (most recent call last):
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/train.py", line 237, in <module>
    train(opt)
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/train.py", line 130, in train
    predictions = model(words=d['context_idxs'],
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
IndexError: Caughtin replica 0 on device 0.
Original Traceback (most recent call last):
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/models/GAIN.py", line 332, in forward
    encoder_output = encoder_outputs[i]  # [slen, bert_hid]
IndexError: index 1 is out of bounds for dimension 0 with size 1
DreamInvoker commented 3 years ago

我们的代码不兼容多GPU。