PennyLaneAI / pennylane-lightning-gpu

GPU enabled Lightning simulator for accelerated circuit simulation. See https://github.com/PennyLaneAI/pennylane-lightning for all future development of this project.
https://docs.pennylane.ai/projects/lightning/en/stable/
Apache License 2.0
49 stars 10 forks source link

How to clear GPU memory #37

Closed rickyHong closed 2 years ago

rickyHong commented 2 years ago

When repeatedly executed in Flask environment, the following error occurs due to memory non-return problem.

from flask import Flask

def __XXXX ..... dev = pennylane.device('lightning.gpu', wires=num_qubits) ..... qnode = pennylane.QNode(circuit, dev)

.................................. ................................. terminate called after throwing an instance of 'Pennylane::Util::LightningException' what(): [/home/quantum/suprasm/pennylane-lightning-gpu/pennylane_lightning_gpu/src/simulator/StateVectorCudaManaged.hpp][Line:114][Method:~StateVectorCudaManaged]: Error in PennyLane Lightning: custatevec not initialized Aborted (core dumped)

mlxd commented 2 years ago

Hi @rickyHong thanks for reporting this. Can you provide information about your system as defined in the issue template? Also, for the above, any information about setting up your environment as well as a minimal working example (runnable python file or directly runnable and copyable code snippet) would help here. Thank you.

See below for the issue template:


Issue description

Description of the issue - include code snippets and screenshots here if relevant. You may use the following template below

Source code and tracebacks

Please include any additional code snippets and error tracebacks related to the issue here.

Additional information

Any additional information, configuration or data that might be necessary to reproduce the issue.

rickyHong commented 2 years ago
**Expected behavior**: (What you expect to happen)
- Return GPU memory after completion of simulation

**Actual behavior**: (What actually happens)
- GPU memory not returned after completion of simulation

**Reproduces how often**: (What percentage of the time does it reproduce?)
- After completion of the simulation, the GPU memory is not returned, and the core dump after accumulation

**System information: (post the output of import pennylane as qml; qml.about())**
- Name: PennyLane
Version: 0.24.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/XanaduAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: /home/quantum/.local/lib/python3.7/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, retworkx, scipy, semantic-version, toml
Required-by: amazon-braket-pennylane-plugin, PennyLane-Lightning, PennyLane-Lightning-GPU, PennyLane-qiskit

Platform info:           Linux-4.15.0-189-generic-x86_64-with-Ubuntu-18.04-bionic
Python version:          3.7.5
Numpy version:           1.19.5
Scipy version:           1.5.2
Installed devices:
- default.gaussian (PennyLane-0.24.0)
- default.mixed (PennyLane-0.24.0)
- default.qubit (PennyLane-0.24.0)
- default.qubit.autograd (PennyLane-0.24.0)
- default.qubit.jax (PennyLane-0.24.0)
- default.qubit.tf (PennyLane-0.24.0)
- default.qubit.torch (PennyLane-0.24.0)
- qiskit.aer (PennyLane-qiskit-0.15.0)
- qiskit.basicaer (PennyLane-qiskit-0.15.0)
- qiskit.ibmq (PennyLane-qiskit-0.15.0)
- lightning.qubit (PennyLane-Lightning-0.24.0)
- lightning.gpu (PennyLane-Lightning-GPU-0.25.0.dev0)
- braket.aws.qubit (amazon-braket-pennylane-plugin-1.3.0)
- braket.local.qubit (amazon-braket-pennylane-plugin-1.3.0)

**Source code and tracebacks**
[Test code for reproduction]

manage.py

from flask import Flask
import pennylane as qml
app = Flask(__name__)

def circuit():
    qml.Hadamard(wires=[0])
    qml.PauliX(wires=[1])
    qml.RX(1.570795, wires=[2])
    qml.RY(1.570795, wires=[3])
    qml.RX(1.570795, wires=[4])
    qml.SWAP(wires=[3, 0])
    qml.PauliZ(wires=[4])
    qml.CNOT(wires=[1, 0])
    qml.CNOT(wires=[2, 3])
    qml.S(wires=[4])
    qml.PauliY(wires=[0])
    qml.RZ(1.570795, wires=[1])
    qml.SWAP(wires=[3, 2])
    qml.Hadamard(wires=[4])
    qml.RZ(1.570795, wires=[0])
    qml.PauliX(wires=[1])
    qml.T(wires=[2])
    qml.PauliZ(wires=[3])
    qml.Hadamard(wires=[4])
    qml.SWAP(wires=[3, 0])
    qml.Hadamard(wires=[4])
    qml.S(wires=[0])
    qml.PauliY(wires=[1])
    qml.RX(1.570795, wires=[2])
    qml.PauliX(wires=[3])
    qml.PauliZ(wires=[4])
    qml.PauliY(wires=[0])
    qml.S(wires=[1])
    qml.RX(1.570795, wires=[2])
    qml.PauliZ(wires=[3])
    qml.PauliY(wires=[4])
    qml.CNOT(wires=[3, 0])
    qml.Hadamard(wires=[4])
    qml.SWAP(wires=[1, 0])
    qml.SWAP(wires=[4, 2])
    qml.RZ(1.570795, wires=[0])
    qml.RY(1.570795, wires=[1])
    qml.RZ(1.570795, wires=[2])
    qml.PauliX(wires=[3])
    qml.PauliY(wires=[4])
    qml.PauliZ(wires=[0])
    qml.RY(1.570795, wires=[1])
    qml.RZ(1.570795, wires=[2])
    qml.PauliX(wires=[3])
    qml.PauliY(wires=[4])
    return qml.expval(qml.PauliZ(0))

@app.route('/board')
def board():
    dev = qml.device('lightning.gpu', wires=5, shots=2048)
    qnode = qml.QNode(circuit, dev)
    return "Success"

app.run(host="localhost or ip",port=5001)

[How to reproduce]
1) Flask app execute 
   python manage.py

2) Web brower (Chrome)
   localhost or ip: 5001/board 

3)  2- Web browsing, reload
   Check GPU non-return simulation result and core dump status
CatalinaAlbornoz commented 2 years ago

Thank you for adding these details @rickyHong!

mlxd commented 2 years ago

Hi @rickyHong I have attempted to run your above example and have not been able to reproduce your failure, assuming an installation of pip install cuquantum==22.0.5.

Are you looking to return the expectation value, or dump the raw GPU memory to the app? For the result of the computation, you can adapt the provided return statement to return f"Success": {qnode()} to obtain the expectation value. Running your example with this change I obtained:

Success: -1.0

If you are looking to access the raw GPU memory pointer, we do not explicitly provide support for that. You can however explore the device statevector which has been copied back to the host after evaluation of he circuit as:

res = qnode()
dev.state

which holds the statvector in a numpy array. Let us know if this helps.

rickyHong commented 2 years ago

Hi @mlxd After reproducing the code, the GPU Memory Leak situation is as follows. I hope you understand the problem properly.

[Notice] The following results are obtained due to insufficient memory due to GPU memory leak during repeated execution.

I think it should include a function similar to the torch case. ex) Current suggestions with gc.collect and torch.cuda.empty_cache()

GPU_MEMORY_LEAK_01

GPU_MEMORY_LEAK_02

GPU_MEMORY_LEAK_03

GPU_MEMORY_LEAK_04

mlxd commented 2 years ago

Hi @rickyHong It looks like the GPUs you are using may not be Tesla-grade cards, due to the memory available. For the best use of cuQuantum and hence lightning.gpu we suggest using V100 or A100 cards with at least 16GB of RAM.

The custatevec library freely must allocate intermediate memory buffers at will to perform certain circuit evaluations. Since we cannot guarantee how much GPU memory is needed, we can only attempt to do so, and throw if the runtime is unable to do so.

We have not verified any memory leaks in the library at the C++ layer, or Python. Is it possible the flask library is holding onto the device object, and hence it is not being released? All GPU memory should be freed when the device is deleted (all the deconstructor calls are made to free the GPU memory).

We do not provide an explicit operator to flush the GPU memory, as this can interfere with other running applications.

If you can provide an extension for your example that gives the above error, we would be happy to investigate further, as we were not able to reproduce it. Thanks!

rickyHong commented 2 years ago

Hi @mlxd can use numba library to release all the gpu memory

It is confirmed that there are no problems yet. If there is a better way, I hope to add a feature in the library.

Here is an example of the applied code.

from numba import cuda device = cuda.get_current_device() device.reset()

Thanks

rickyHong commented 2 years ago

can use numba library to release all the gpu memory