Closed HarveyYi closed 1 year ago
This happens in latest pytorch, say pytorch 1.10. We just updated the code to change the in-place operation to out-of-place one. See if it works for you now.
This happens in latest pytorch, say pytorch 1.10. We just updated the code to change the in-place operation to out-of-place one. See if it works for you now.
Hi, JustinLin610. Thank you very much for your very timely reply.
In my practice, simply changing "net_output[0].maskedfill(~constraint_masks, -math.inf)" to "net_output[0] = net_output[0].masked_fill(~constraint_masks, -math.inf)" will result in TypeError: 'tuple' object does not support item assignment.
In my practice, I first transformed "net_output" into a list, net_output=list(net_output) net_output[0] = net_output[0].masked_fill(~constraint_masks, -math.inf) Then it can be trained normally.
Our environment: 3090x2, ubuntu 1804, Pytorch 1.10.0 py3.8_cuda11.3_cudnn8.2.0_0
Task: Classification(ImageNet 1k), Finetune, OFA-tiny
2023-01-11 17:09:16 - trainer.py[line:703] - INFO: begin training epoch 1 2023-01-11 17:09:16 - train.py[line:305] - INFO: Start iterating over samples Traceback (most recent call last): File "train.py", line 537, in
Traceback (most recent call last):
File "train.py", line 537, in
cli_main()
File "train.py", line 530, in cli_main
cli_main()
File "train.py", line 530, in cli_main
distributed_utils.call_main(cfg, main)
File "/home/yhh/yhh/codes/multimodal/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
distributed_utils.call_main(cfg, main)
File "/home/yhh/yhh/codes/multimodal/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
File "/home/yhh/yhh/codes/multimodal/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main File "/home/yhh/yhh/codes/multimodal/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main main(cfg, kwargs)
main(cfg, kwargs) File "train.py", line 199, in main
File "train.py", line 199, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/contextlib.py", line 75, in inner valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/contextlib.py", line 75, in inner return func(*args, *kwds)return func(args, **kwds)
File "train.py", line 310, in train File "train.py", line 310, in train log_output = trainer.train_step(samples)log_output = trainer.train_step(samples)
File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/contextlib.py", line 75, in inner File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/contextlib.py", line 75, in inner return func(*args, *kwds)return func(args, **kwds)
File "/data/yhh/codes/multimodal/OFA/trainer.py", line 806, in train_step File "/data/yhh/codes/multimodal/OFA/trainer.py", line 806, in train_step raise e raise e File "/data/yhh/codes/multimodal/OFA/trainer.py", line 773, in train_step
File "/data/yhh/codes/multimodal/OFA/trainer.py", line 773, in train_step loss, sample_size_i, logging_output = self.task.train_step(
loss, sample_size_i, logging_output = self.task.train_step( File "/data/yhh/codes/multimodal/OFA/tasks/ofa_task.py", line 334, in train_step
File "/data/yhh/codes/multimodal/OFA/tasks/ofa_task.py", line 334, in train_step loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl File "/home/yhh/anaconda3/envs/ofa/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 200, in forward return forward_call(input, kwargs) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 200, in forward loss, nll_loss, ntokens = self.compute_loss(model, net_output, sample, update_num, reduce=reduce) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 245, in compute_loss loss, nll_loss, ntokens = self.compute_loss(model, net_output, sample, update_num, reduce=reduce) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 245, in compute_loss lprobs, target, constraint_masks = self.get_lprobs_and_target(model, net_output, sample) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 222, in get_lprobs_and_target lprobs, target, constraint_masks = self.get_lprobs_and_target(model, net_output, sample) File "/data/yhh/codes/multimodal/OFA/criterions/label_smoothed_cross_entropy.py", line 222, in get_lprobs_and_target net_output[0].maskedfill(~constraint_masks, -math.inf) RuntimeError: Output 0 of _DDPSinkBackward is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.**