Liadrinz / transformers-copy-mechanism

Overwrite huggingface BART and GPT with copy mechanism
19 stars 1 forks source link

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment #3

Open BNDSllx opened 1 year ago

BNDSllx commented 1 year ago

您好,我参考您的代码重写了 T5forConditionalGeneration 类:

@dataclass
class T5ForConditionalGenerationWithCopyModule(T5ForConditionalGeneration):

    def __init__(self, config: T5Config, src_input: dict):
        super().__init__(config)
        # src_input: label encoder output
        self.src_input = src_input
        self.copy_module = CopyMechModule(config.d_model, config.vocab_size)
        self.post_init()

但是调用的时候传入 config 会出现如下错误:

raceback (most recent call last):
  File "/home/classification/run_groov.py", line 1291, in <module>
    main()
  File "/home/classification/run_groov.py", line 562, in main
    model = T5ForConditionalGenerationWithCopyModule.from_pretrained(
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1583, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/configuration_utils.py", line 521, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/configuration_utils.py", line 546, in get_config_dict
    original_kwargs = copy.deepcopy(kwargs)
  File "/home/anaconda3/envs/cls/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/anaconda3/envs/cls/lib/python3.9/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/anaconda3/envs/cls/lib/python3.9/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/anaconda3/envs/cls/lib/python3.9/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/anaconda3/envs/cls/lib/python3.9/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/torch/_tensor.py", line 102, in __deepcopy__
    raise RuntimeError(
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

参考您的代码,不传入 config,只传入 model_name_or_path 的话,则会报另一个错误:

Traceback (most recent call last):
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1882, in from_pretrained
    model, missing_keys, unexpected_keys, mismatched_keys, error_msgs = cls._load_pretrained_model(
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1968, in _load_pretrained_model
    uninitialized_modules = model.retrieve_modules_from_names(
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2094, in retrieve_modules_from_names
    for name, module in self.named_modules():
  File "/home/anaconda3/envs/cls/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1881, in named_modules
    if self not in memo:
TypeError: unhashable type: 'T5ForConditionalGenerationWithCopyModule'

请问您遇到过类似错误嘛?

Liadrinz commented 1 year ago

模型不能加@dataclass装饰器哦😂我复现了一下,去掉@dataclass就可以咯

Liadrinz commented 1 year ago

您如果在其他更多模型上实现了Copy Mechanism欢迎contribute哦~

BNDSllx commented 1 year ago

明白啦!非常感谢!

另外还想请教一下,您在跑代码的时候,中间有输出过 p_copy, p_gen, cp_logit, lm_logits 这些值嘛?大概是什么范围的呢?我想参考一下,看看我在 t5 上实现的对不对。

我这里跑代码两个概率都在 0.5 左右浮动,但是 cp_logits 计算出来 dim=1one-hot 全都是在同一个位置(比如下面例子里全在每个向量的第二个位置),感觉有点反直觉:

lm_logits tensor([[[ -8.3967, -10.7758, -13.5871,  ..., -11.5895,  -7.0838, -43.3770],
         [-21.0494,  -5.5865,  -7.6967,  ...,  -9.6996, -37.8186, -36.0446]]],
       device='cuda:0', grad_fn=<UnsafeViewBackward0>)
cp_logits tensor([[[0.0000, 0.5101, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
         [0.0000, 0.4398, 0.0000,  ..., 0.0000, 0.0000, 0.0000]]],
       device='cuda:0', grad_fn=<UnsafeViewBackward0>)
p_gen = tensor([[[0.4560],
         [0.4470]]], device='cuda:0', grad_fn=<SigmoidBackward0>)
BNDSllx commented 1 year ago

原因我理解是因为 input_one_hot 张量中只有一个元素是有值的,所以乘出来的数组只有一列不为0?

Liadrinz commented 1 year ago

抱歉,这段时间比较忙没看issues. 您的问题如果还没解决的话,能不能提供一下复现该问题的代码呢?