Closed sevagh closed 3 years ago
Hello,
I'm trying to incorporate some of these loss functions in my PyTorch model. I get the following error:
(umx-gpu) sevagh:open-unmix-nsgt $ ./nsgt_aws.sh Using GPU: True Configuring NSGT to use GPU unmix model: OpenUnmix( (fc1): Linear(in_features=70528, out_features=512, bias=False) (bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act1): Tanh() (rnn): GRU(512, 256, num_layers=3, dropout=0.4, bidirectional=True) (fc2): Linear(in_features=1024, out_features=512, bias=False) (bn2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act2): ReLU() (fc3): Linear(in_features=512, out_features=76608, bias=False) (bn3): BatchNorm1d(76608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (act3): ReLU() ) Training Epoch: 0%| | 0/100 [00:00<?, ?it/s] X.shape, dtype: torch.Size([24, 2, 126, 30, 304, 2]) torch.float32 | 0/230 [00:00<?, ?it/s] y_hat.shape, dtype: torch.Size([24, 2, 132300]) torch.float32 loss: 7.6625075340271 Training batch: 0%| | 0/230 [00:00<?, ?it/s] Training Epoch: 0%| | 0/100 [00:00<?, ?it/s] Traceback (most recent call last): File "scripts/train.py", line 430, in <module> main() File "scripts/train.py", line 380, in main train_loss = train(args, unmix, nsgt, insgt, cnorm, loss_fn, device, train_sampler, optimizer) File "scripts/train.py", line 60, in train loss.backward() File "/home/sevagh/.conda/envs/umx-gpu/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/sevagh/.conda/envs/umx-gpu/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
The training loop looks like this:
def train(args, model, nsgt, insgt, cnorm, device, train_sampler, optimizer): losses = utils.AverageMeter() loss_fn = auraloss.freq.STFTLoss() unmix.train() pbar = tqdm.tqdm(train_sampler, disable=args.quiet) for x, y in pbar: pbar.set_description("Training batch") x, y = x.to(device), y.to(device) optimizer.zero_grad() X = nsgt(x) y_hat = insgt(model(X), y.shape[-1]) print('y_hat.shape, dtype: {0} {1}'.format(y_hat.shape, y_hat.dtype)) loss = loss_fn(y_hat, x) print('loss: {0}'.format(loss)) #loss = Variable(loss, requires_grad=True) loss.backward() optimizer.step() losses.update(loss.item(), y.size(1)) gc.collect() return losses.avg
Adding this line after helps solve the issue:
loss = Variable(loss, requires_grad=True)
The issue is in another part of my code. Even the regular MSE loss function doesn't work anymore. Sorry.
Hello,
I'm trying to incorporate some of these loss functions in my PyTorch model. I get the following error:
The training loop looks like this:
Adding this line after helps solve the issue: