Oneflow-Inc / models

Models and examples built with OneFlow
Apache License 2.0
96 stars 37 forks source link

Dev roberta and update CPT NEW 问题记录 #354

Closed songzetao closed 2 years ago

songzetao commented 2 years ago

本 issue 记录 pr https://github.com/Oneflow-Inc/models/pull/364 产生的问题

songzetao commented 2 years ago

contiguous()

报错信息

Traceback (most recent call last):
  File "train_SST2.py", line 109, in <module>
    train(args)
  File "train_SST2.py", line 55, in train
    output = model(input_ids, attention_mask)
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/classifier_SST2.py", line 21, in forward
    outputs = self.roberta(inputs, masks)
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/models/roberta.py", line 928, in forward
    output_hidden_states,
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/models/roberta.py", line 639, in forward
    output_attentions,
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/models/roberta.py", line 506, in forward
    output_attentions,
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/models/roberta.py", line 393, in forward
    output_attentions,
  File "/root/miniconda3/lib/python3.7/site-packages/oneflow/nn/module.py", line 115, in __call__
    res = self.forward(*args, **kwargs)
  File "/root/song/models/NLP/roberta/roberta/models/roberta.py", line 299, in forward
    context_layer = context_layer.view(*new_context_layer_shape)
RuntimeError: Check failed: infered_stride.has_value() view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Segmentation fault

原因

tensor 不连续,即tensor分布在不同的内存或显存中。

解决

使用 contiguous() 函数解决此问题。 oneflow nightly 版本目前已经支持该函数。