Closed QinengWang-Aiden closed 1 year ago
Does it work without Pytorch Compile?
Firstly, I'd like to apologize for providing a misleading error message earlier. The error in the code I provided was due to my mistakenly writing self.ln_encode
instead of self.ln_encoder
. So, the actual error message was not related to that. Instead, it was the following:
ERROR RUNNING GUARDS __init__ <string>:2
lambda L, **___kwargs_ignored:
___guarded_code.valid and
hasattr(L['loss'], '_dynamo_dynamic_indices') == False and
___check_type_id(L['self'], 142654656) and
hasattr(L['logits'], '_dynamo_dynamic_indices') == False and
___check_obj_id(L['self'].loss, 7628576) and
___check_obj_id(L['self'].logits, 7628576) and
___check_obj_id(L['self'].encoder_outputs, 7628576) and
hasattr(L['encoder_outputs'].hidden_states, '_dynamo_dynamic_indices') == False and
hasattr(L['encoder_outputs'].attention_mask, '_dynamo_dynamic_indices') == False and
___is_grad_enabled() and
not ___are_deterministic_algorithms_enabled() and
___is_torch_function_enabled() and
utils_device.CURRENT_DEVICE == None and
___check_tensors(L['loss'], L['logits'], L['encoder_outputs'].hidden_states, L['encoder_outputs'].attention_mask)
Error executing job with overrides: []
Traceback (most recent call last):
File "/data2/usr/projects/nanoT5/nanoT5/main.py", line 85, in main
train(model, train_dataloader, test_dataloader, accelerator,
File "/data2/usr/projects/nanoT5/nanoT5/utils/train_utils.py", line 237, in train
loss, stats = forward(model, batch)
File "/data2/usr/projects/nanoT5/nanoT5/utils/train_utils.py", line 90, in forward
outputs = model(**batch)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 333, in _fn
return fn(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
return model_forward(*args, **kwargs)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in <resume in __call__>
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/accelerate/utils/operations.py", line 548, in convert_to_fp32
return recursively_apply(_convert_to_fp32, tensor, test_type=_is_fp16_bf16_tensor)
File "/home/usr/miniconda3/envs/nanoT5/lib/python3.10/site-packages/accelerate/utils/operations.py", line 119, in recursively_apply
return type(data)(
File "<string>", line 21, in guard
AttributeError: 'NoneType' object has no attribute 'hidden_states'
Secondly, I just attempted to run the code with torch.compile=False
and found that it runs successfully using both the python -m nanoT5.main
and accelerate launch -m nanoT5.main
commands (at least for the first 100 steps).
Since I haven't delved deeply into the mechanics of torch.compile
, I would like to inquire about the possible reasons for the previous error message. Thank you in advance :)
That's good that it works without torch.compile. Unfortunately, I don't know the underlying dynamic of torch.compile well enough to tell you what's the issue. I guess that you may try to reproduce this error with some sketch model and then raise an issue on the official PyTorch repo.
Okay, thanks for your prompt responses!
I try to add my own loss function using the encoder's hidden states, and I add a new linear layer similar to your layer
self.lm_head
to obtain the corresponding logits. However, the training process fails every time and it seems like I did not use the linear layer correctly, but I do not know why... Here is my modified part ofMyT5
module:And here is the error message:
Looking forward to your assistance :)