Open leolettuce opened 2 years ago
Hey @leolettuce! Thanks for alerting us to this.
While we dig into the problem here, I am wondering if forcing a decomposition for the MultiRZ
gate could provide a workaround in the meantime?
As a small example,
custom_decomps={'MultiRZ': qml.MultiRZ.compute_decomposition}
dev = qml.device('lightning.gpu', wires=2, custom_decomps=custom_decomps)
@qml.qnode(dev, diff_method='adjoint')
def cost(theta):
qml.Hadamard(wires=0)
qml.Hadamard(wires=1)
qml.MultiRZ(theta, wires=[1, 0])
return qml.expval(qml.PauliX(1))
x = np.array(0.5, requires_grad=True)
cost(x)
Thank you! Yes the decomposition of the MultiRZ
gate is resolving the error.
Nonetheless, I still have two open questions:
[/pennylane-lightning-gpu/pennylane_lightning_gpu/src/simulator/StateVectorCudaBase.hpp][Line :246][Method:StateVectorCudaBase]: Error in PennyLane Lightning: out of memory
Is there a simple way to reduce the memory usage?
Pennylane v0.22
3.4 seconds
19.6 seconds
21.5 seconds
22.7 seconds
0.2 seconds
Pennylane v0.21
31.9 seconds
32.3 seconds
0.2 seconds
Pennylane v0.20
44 seconds
33 seconds
0.2 seconds
Pennylane v0.19
7.4 seconds
9.2 seconds
0.2 seconds
I would have expected that the qulacs.simulator device would yield similar results on all pennylane versions. I also experienced this behavior on larger problems. Is there a known explanation for that?
Hi @leolettuce thanks for the update. For the performance differences you've shown, this is likely due to a similar cause as https://github.com/PennyLaneAI/pennylane/issues/2430#issuecomment-1092880807 . Namely, to ensure better usage of quantum resources, we make use of more up-front classical processing in PennyLane v0.20 and above, as this allows us to support n-th order gradients relatively easily. The Qulacs device is also taking advantage of the parameter-shift method for gradients, which can have a high cost for large circuits with several parameters. We are currently addressing some of the additional costs associated with improving the quantum resource uses and additional classical overheads.
As for the question about memory usage, this is also a challenging one. Running large problems on lightning.gpu
can depend on a number of factors, one of which is the available RAM on the given GPU. For V100s, whether this is either 16GB or 32GB can make a big factor on whether the circuit runs. Similarly, for A100, having a 40 or 80GB version is the same, dependent upon the problem at hand. Due to how memory is allocated by intermediate library calls, it can be difficult to predict up-front whether a problem will fit. Since you have a DGX box to access, you can always use lightning.qubit
with diff_method=adjoint
, and can control the number of concurrent expectation value calculations with OMP_NUM_THREADS
as mentioned bottom of the page here.
Feel free to provide a minimum working example of QAOA though if you would like us to investigate this further; there may be some optimizations we can provide based on your work-load needs.
Expected behavior
We are simulating QAOA on an NVIDIA DGX system. Since the new pennylane version (v0.22) supports cuQuantum using the "lightning.gpu" device, we want to use it for potential speedups. (https://pennylane.ai/blog/2022/03/pennylane-v022-released/#accelerate-your-simulations-with-cuquantum-gpu-support)
Actual behavior
I installed the device and all necessary libraries. However, by simply replacing "default.qubit" or "qulacs.simulator" by "lightning.gpu", no optimization is happening. The cost_function stays constant close to zero.
As I have read that "lightning.gpu" works the best with diff_method set to "adjoint", I also tried that. However, then I got the following error message: "The MultiRZ operation is not supported using the "adjoint" differentiation method"
Additional information
Before, we were using pennylane version 0.19 and the "default.qubit" and "qulacs.simulator" device. The latter also has GPU support.
With the new version 0.22, I also realized that the simulation is significantly slower compared to the previous pennylane version.
For this github issue, I simply took the example problem of the QAOA tutorial to reproduce my problem. (https://pennylane.ai/qml/demos/tutorial_qaoa_intro.html)
Source code
Tracebacks
System information
Existing GitHub issues