csteinmetz1 / auraloss

Collection of audio-focused loss functions in PyTorch
Apache License 2.0
717 stars 66 forks source link

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #24

Closed sevagh closed 3 years ago

sevagh commented 3 years ago

Hello,

I'm trying to incorporate some of these loss functions in my PyTorch model. I get the following error:

(umx-gpu) sevagh:open-unmix-nsgt $ ./nsgt_aws.sh
Using GPU: True
Configuring NSGT to use GPU
unmix model:
OpenUnmix(
  (fc1): Linear(in_features=70528, out_features=512, bias=False)
  (bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act1): Tanh()
  (rnn): GRU(512, 256, num_layers=3, dropout=0.4, bidirectional=True)
  (fc2): Linear(in_features=1024, out_features=512, bias=False)
  (bn2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act2): ReLU()
  (fc3): Linear(in_features=512, out_features=76608, bias=False)
  (bn3): BatchNorm1d(76608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act3): ReLU()
)
Training Epoch:   0%|                                                                                  | 0/100 [00:00<?, ?it/s]
X.shape, dtype: torch.Size([24, 2, 126, 30, 304, 2]) torch.float32                                      
| 0/230 [00:00<?, ?it/s]
y_hat.shape, dtype: torch.Size([24, 2, 132300]) torch.float32
loss: 7.6625075340271
Training batch:   0%|                                                                                  | 0/230 [00:00<?, ?it/s]
Training Epoch:   0%|                                                                                  | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "scripts/train.py", line 430, in <module>
    main()
  File "scripts/train.py", line 380, in main
    train_loss = train(args, unmix, nsgt, insgt, cnorm, loss_fn, device, train_sampler, optimizer)
  File "scripts/train.py", line 60, in train
    loss.backward()
  File "/home/sevagh/.conda/envs/umx-gpu/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/sevagh/.conda/envs/umx-gpu/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

The training loop looks like this:

def train(args, model, nsgt, insgt, cnorm, device, train_sampler, optimizer):
    losses = utils.AverageMeter()
    loss_fn = auraloss.freq.STFTLoss()
    unmix.train()
    pbar = tqdm.tqdm(train_sampler, disable=args.quiet)
    for x, y in pbar:
        pbar.set_description("Training batch")
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()

        X = nsgt(x)
        y_hat = insgt(model(X), y.shape[-1])
        print('y_hat.shape, dtype: {0} {1}'.format(y_hat.shape, y_hat.dtype))

        loss = loss_fn(y_hat, x)
        print('loss: {0}'.format(loss))
        #loss = Variable(loss, requires_grad=True)

        loss.backward()
        optimizer.step()
        losses.update(loss.item(), y.size(1))
        gc.collect()
    return losses.avg

Adding this line after helps solve the issue:

loss = Variable(loss, requires_grad=True)
sevagh commented 3 years ago

The issue is in another part of my code. Even the regular MSE loss function doesn't work anymore. Sorry.