[BUG] If I let PyTorch autograd calculate the derivative value and include it in the loss, optimizer.step() will not reduce the loss

yoichiiz2512 commented 2 years ago

Expected behavior

[0]     loss = 1.4446197748184204
[1]     loss = 1.2142350673675537
[2]     loss = 1.0044920444488525
[3]     loss = 0.815667986869812
[4]     loss = 0.6479007601737976
[5]     loss = 0.5011584162712097
[6]     loss = 0.3752087652683258
[7]     loss = 0.2695856988430023
[8]     loss = 0.18355925381183624
[9]     loss = 0.11610892415046692

(The above is an example of a decreasing loss value)

Actual behavior

[0]     loss = 0.461514413356781
[1]     loss = 0.461514413356781
[2]     loss = 0.461514413356781
[3]     loss = 0.461514413356781
[4]     loss = 0.461514413356781
[5]     loss = 0.461514413356781
[6]     loss = 0.461514413356781
[7]     loss = 0.461514413356781
[8]     loss = 0.461514413356781
[9]     loss = 0.461514413356781

(The above values vary according to random numbers)

Additional information

I was trying to implement a Physics Informed Neural Network and had PyTorch autograd calculate the differential values. It worked with a classical neural network, but with PennyLane it seemed to be unable to approximate the differential equation.

Source code

import pennylane as qml
import torch

class CLayer(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

        self.fc = torch.nn.Linear(1, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.fc(x)

class QLayer(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

        q_device = qml.device('default.qubit', wires=1)

        qnode = qml.QNode(
            qcircuit,
            q_device,
            interface='torch',
            diff_method='adjoint'
        )

        self.qlayer = qml.qnn.TorchLayer(qnode, {'weights': (3)})

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = torch.tanh(x)
        return self.qlayer(x)

def qcircuit(inputs, weights):
    qml.RX(inputs[0], wires=0)
    qml.Rot(*weights, wires=0)
    return [qml.expval(qml.PauliZ(0))]

def main():
    # ok : CLayer & normal_loss
    # ok : CLayer & derived_loss
    # ok : QLayer & normal_loss
    # ng : QLayer & derived_loss

    # Here, "ng" means that the loss does not decrease (and not change).

    # net = CLayer()
    net = QLayer()

    optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

    x = torch.tensor([[0.1]], requires_grad=True)

    for i in range(10):
        optimizer.zero_grad()

        y = net(x)

        dy_dx = df(y, x)

        normal_loss = (y - 1.0).pow(2).mean()
        derived_loss = (dy_dx - 0.5).pow(2).mean()

        # loss = normal_loss
        loss = derived_loss

        print(f'[{i}]\tloss = {float(loss)}')

        loss.backward()
        optimizer.step()

def df(output: torch.Tensor, x: torch.Tensor) -> torch.Tensor:
    df_value = torch.autograd.grad(
        output,
        x,
        create_graph=True
    )[0]

    return df_value

if __name__ == '__main__':
    main()

Tracebacks

(No error tracebacks)

System information

Name: PennyLane
Version: 0.26.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/XanaduAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: /work/for_pennylane_bug_report/.venv/lib/python3.9/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, retworkx, scipy, semantic-version, toml
Required-by: PennyLane-Lightning, PennyLane-Lightning-GPU, PennyLane-qiskit

Platform info:           Linux-4.15.0-159-generic-x86_64-with-glibc2.31
Python version:          3.9.7
Numpy version:           1.23.3
Scipy version:           1.9.1
Installed devices:
- default.gaussian (PennyLane-0.26.0)
- default.mixed (PennyLane-0.26.0)
- default.qubit (PennyLane-0.26.0)
- default.qubit.autograd (PennyLane-0.26.0)
- default.qubit.jax (PennyLane-0.26.0)
- default.qubit.tf (PennyLane-0.26.0)
- default.qubit.torch (PennyLane-0.26.0)
- default.qutrit (PennyLane-0.26.0)
- qiskit.aer (PennyLane-qiskit-0.24.0)
- qiskit.basicaer (PennyLane-qiskit-0.24.0)
- qiskit.ibmq (PennyLane-qiskit-0.24.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.24.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.24.0)
- lightning.qubit (PennyLane-Lightning-0.26.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.26.0

Existing GitHub issues

[X] I have searched existing GitHub issues to make sure the issue does not already exist.

albi3ro commented 2 years ago

Thanks for bringing this to our attention @yoichiiz2512 .

We're looking into the problem, and in the meantime you can use diff_method="backprop", as that seems to show decreasing losses with the code you provided.

The problem may have something to do with second derivatives, as setting

            diff_method='parameter-shift',
            max_diff=2

also gives a decreasing loss.

albi3ro commented 2 years ago

@yoichiiz2512 The problem does indeed seem to be second derivatives. Adjoint differentiation only works for first-order derivatives. For parameter shift, second-order derivatives have to be manually requested with max_diff=2.

Switching loss to normal_lost instead of derived_loss, I once again see a decreasing loss. That's because we are only using first derivatives, instead of derivatives of derivatives.

yoichiiz2512 commented 2 years ago

Thank you very much. As you replied, specifying backprop or parameter-shift+max_diff=2 works as desired. It was not a bug but an error in specifying the parameters.

PennyLaneAI / pennylane