Closed giladqm closed 3 months ago
Hi @giladqm
The repo https://github.com/PennyLaneAI/pennylane-lightning-gpu and all of its contents were migrated to https://github.com/PennyLaneAI/pennylane-lightning so all of the lightning.gpu
device components can now be installed from this repository. For aarch64 system, we do not currently provide wheels through PyPI, but there are multiple options to install the package (see https://pennylane.ai/install/#high-performance-computing-and-gpus for more details)
lightning.gpu
for aarch64 is available through Conda-Forge as
conda install pennylane-lightning-gpu
Though, this package is provided mostly with community support, and will likely not support all other standard devices (e.g. you'll need to manually install lightning.qubit
and other packages too).
Build from source: this should work from the docker container you are also discussing here, assuming you have nvcc and supporting libraries installed. For simplicity, we can first install pennylane
and pennylane-lightning
, and all required build dependencies, then build and install pennylane-lightning-gpu
as:
python -m venv pyenv && source ./pyenv/bin/activate
python -m pip install pennylane
git clone https://github.com/PennyLaneAI/pennylane-lightning --branch latest_release --single-branch
cd pennylane-lightning
# requirements-dev.txt does not have wheels for all packages so we can explicitly list these out
python -m pip install cmake ninja custatevec_cu12 pip~=22.0
PL_BACKEND="lightning_gpu" python -m pip install . --verbose
The package should build and install a natively built version of the libraries into your python environment.
Feel free to let us know if the above doesn't work. Shipping PyPI wheels for aarch64 is on our roadmap, but we have no current timeline to provide yet.
This is what I get: (pyenv) gilad@gracehopper:~/pennylane-lightning$ PL_BACKEND="lightning_gpu" python -m pip install . --verbose Using pip 22.0.2 from /home/gilad/pyenv/lib/python3.10/site-packages/pip (python 3.10) Processing /home/gilad/pennylane-lightning Running command python setup.py egg_info running egg_info creating /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info writing /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/dependency_links.txt writing entry points to /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/entry_points.txt writing requirements to /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/requires.txt writing top-level names to /tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching 'pennylane_lightning/lightning_qpu/lightning_gpu.toml' adding license file 'LICENSE' writing manifest file '/tmp/pip-pip-egg-info-f0k0xpmr/PennyLane_Lightning_GPU.egg-info/SOURCES.txt' Preparing metadata (setup.py) ... done Requirement already satisfied: pennylane>=0.34 in /home/gilad/pyenv/lib/python3.10/site-packages (from PennyLane-Lightning-GPU==0.36.0) (0.36.0) Requirement already satisfied: pennylane_lightning==0.36.0 in /home/gilad/pyenv/lib/python3.10/site-packages (from PennyLane-Lightning-GPU==0.36.0) (0.36.0) Requirement already satisfied: numpy<2.0 in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (1.26.4) Requirement already satisfied: rustworkx in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (0.14.2) Requirement already satisfied: semantic-version>=2.7 in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (2.10.0) Requirement already satisfied: typing-extensions in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (4.12.0) Requirement already satisfied: autoray>=0.6.1 in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (0.6.12) Requirement already satisfied: networkx in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (3.3) Requirement already satisfied: scipy in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (1.13.1) Requirement already satisfied: autograd in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (1.6.2) Requirement already satisfied: requests in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (2.32.2) Requirement already satisfied: toml in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (0.10.2) Requirement already satisfied: appdirs in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (1.4.4) Requirement already satisfied: cachetools in /home/gilad/pyenv/lib/python3.10/site-packages (from pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (5.3.3) Requirement already satisfied: future>=0.15.2 in /home/gilad/pyenv/lib/python3.10/site-packages (from autograd->pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (1.0.0) Requirement already satisfied: idna<4,>=2.5 in /home/gilad/pyenv/lib/python3.10/site-packages (from requests->pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (3.7) Requirement already satisfied: certifi>=2017.4.17 in /home/gilad/pyenv/lib/python3.10/site-packages (from requests->pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (2024.2.2) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/gilad/pyenv/lib/python3.10/site-packages (from requests->pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (2.2.1) Requirement already satisfied: charset-normalizer<4,>=2 in /home/gilad/pyenv/lib/python3.10/site-packages (from requests->pennylane>=0.34->PennyLane-Lightning-GPU==0.36.0) (3.3.2) Using legacy 'setup.py install' for PennyLane-Lightning-GPU, since package 'wheel' is not installed. Installing collected packages: PennyLane-Lightning-GPU Running command Running setup.py install for PennyLane-Lightning-GPU running install /home/gilad/pyenv/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( running build running build_py running egg_info writing PennyLane_Lightning_GPU.egg-info/PKG-INFO writing dependency_links to PennyLane_Lightning_GPU.egg-info/dependency_links.txt writing entry points to PennyLane_Lightning_GPU.egg-info/entry_points.txt writing requirements to PennyLane_Lightning_GPU.egg-info/requires.txt writing top-level names to PennyLane_Lightning_GPU.egg-info/top_level.txt reading manifest file 'PennyLane_Lightning_GPU.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching 'pennylane_lightning/lightning_qpu/lightning_gpu.toml' adding license file 'LICENSE' writing manifest file 'PennyLane_Lightning_GPU.egg-info/SOURCES.txt' running build_ext ░█░░░▀█▀░█▀▀░█░█░▀█▀░█▀█░▀█▀░█▀█░█▀▀░ ░█░░░░█░░█░█░█▀█░░█░░█░█░░█░░█░█░█░█░ ░▀▀▀░▀▀▀░▀▀▀░▀░▀░░▀░░▀░▀░▀▀▀░▀░▀░▀▀▀░
-- pennylane_lightning version 0.36.0 -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- PL_BACKEND: lightning_gpu -- ENABLE_WARNINGS is OFF. -- ENABLE_OPENMP is ON. -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Python scipy-lib path: /home/gilad/pyenv/lib/python3.10/site-packages/scipy.libs -- pybind11 v2.11.1 Python site-packages directory: /home/gilad/pyenv/lib/python3.10/site-packages ░█░░░▀█▀░█▀▀░█░█░▀█▀░█▀█░▀█▀░█▀█░█▀▀░░░░█▀▀░█▀█░█░█ ░█░░░░█░░█░█░█▀█░░█░░█░█░░█░░█░█░█░█░░░░█░█░█▀▀░█░█ ░▀▀▀░▀▀▀░▀▀▀░▀░▀░░▀░░▀░▀░▀▀▀░▀░▀░▀▀▀░▀░░▀▀▀░▀░░░▀▀▀
CMake Error at /home/gilad/pyenv/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/Internal/CMakeCUDAArchitecturesValidate.cmake:7 (message): CMAKE_CUDA_ARCHITECTURES must be non-empty if set. Call Stack (most recent call first): /home/gilad/pyenv/lib/python3.10/site-packages/cmake/data/share/cmake-3.29/Modules/CMakeDetermineCUDACompiler.cmake:112 (cmake_cuda_architectures_validate) pennylane_lightning/core/src/simulators/lightning_gpu/CMakeLists.txt:9 (project)
-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "
× Running setup.py install for PennyLane-Lightning-GPU did not run successfully. │ exit code: 1 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip. full command: /home/gilad/pyenv/bin/python -u -c ' exec(compile('"'"''"'"''"'"'
#
distutils.core
to work with newer packaging standards.sys.argv[0]
to the underlying setup.py
, when invoking setup.py
so-c
. This avoids the following warning:import os, sys, tokenize
try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute setup.py
since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)
file = %r sys.argv[0] = file
if os.path.exists(file):
filename = file
with tokenize.open(file) as f:
setup_py_code = f.read()
else:
filename = "
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/home/gilad/pennylane-lightning/setup.py'"'"',), "
× Encountered error while trying to install package. ╰─> PennyLane-Lightning-GPU
note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.
Hi @giladqm
Are you running this on the Nvidia cuQuantum appliance docker image?
I have verified this with docker run --platform aarch64 --rm -it nvcr.io/nvidia/cuquantum-appliance:24.03-arm64
to spin up the container locally, and then I have run exactly:
python -m venv pyenv && source ./pyenv/bin/activate
python -m pip install pennylane
git clone https://github.com/PennyLaneAI/pennylane-lightning --branch latest_release --single-branch
cd pennylane-lightning
# requirements-dev.txt does not have wheels for all packages so we can explicitly list these out
python -m pip install cmake ninja custatevec_cu12 pip~=22.0
PL_BACKEND="lightning_gpu" python -m pip install . --verbose
and the installation completes successfully. Is there some other environment or custom modifications you have made to your environment, or are you working on a different container image than nvcr.io/nvidia/cuquantum-appliance:24.03-arm64
?
If you are using a different env, there may be missing packages --- in this instance, it looks like setuptools
isn't available in your environment. I'd recommend installing this package, since it is likely the cause of the failure in your env. Let us know if this helps.
I followed you instructions and indeed the installation was successful. Unfortunately it looks like the GPU isn't being used. This is the code I'm running:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Specify the index of the GPU you want to use
import time
import logging
import torch
import pennylane as qml
from matplotlib import pyplot as plt
from pennylane import numpy as np
# Define the directory to save outputs
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Set up logging
log_file_path = os.path.join(output_dir, "output_log.txt")
logging.basicConfig(filename=log_file_path, level=logging.INFO,
format='%(asctime)s %(message)s', filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
def circuit0_basic(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires=wires[0])
def circuit0(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires[wires[0]])
def runtest(H, cfg, show=False):
max_iterations = cfg.max_iterations
num_qubits = len(H.wires)
num_param_sets = (2 ** num_qubits) - 1
params = np.random.uniform(low=-np.pi / 2, high=np.pi / 2, size=(num_param_sets, 3))
params = np.array(params, requires_grad=True)
dev = qml.device("lightning.gpu", wires=num_qubits, batch_obs=1)
logging.info(f"Using device: {dev}")
@qml.qnode(dev, interface='autograd')
def cost_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.expval(H)
@qml.qnode(dev, interface='autograd')
def state_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.state()
# Check if GPU is available and log it after device is set
if torch.cuda.is_available():
logging.info("CUDA is available. Using GPU.")
else:
logging.info("CUDA is not available. Using CPU.")
logging.info("CUDA check details:")
logging.info(f"CUDA available: {torch.cuda.is_available()}")
logging.info(f"CUDA device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
logging.info(f"CUDA device name: {torch.cuda.get_device_name(0)}")
opt = qml.AdamOptimizer(stepsize=0.1)
conv_tol = 1e-06
energy_plot = []
prev_energy = cost_fn(params)
for n in range(max_iterations):
params, energy = opt.step_and_cost(cost_fn, params)
logging.info(f"Iteration {n + 1}/{max_iterations}: Energy = {energy:.6f}")
energy_plot.append(energy)
if np.abs(energy - prev_energy) <= conv_tol:
logging.info("Convergence reached!")
break
prev_energy = energy
# Print progress
progress = (n + 1) / max_iterations * 100
logging.info(f"Progress: {progress:.2f}%")
logging.info("Optimization completed.")
# Using the params, find the ground state vector
best_params = params
ground_state = state_fn(best_params)
# Plot energies
plt.clf()
plt.plot(energy_plot)
plt.xlabel("Iterations")
plt.ylabel("Energy")
plt.title("Energy at each iteration")
energy_plot_path = os.path.join(output_dir, "energy_plot.png")
plt.savefig(energy_plot_path)
if show:
plt.show()
return energy, ground_state
class Params0:
pass
class Hamiltonian:
def __init__(self, N):
self.ham = qml.Hamiltonian([], [])
self.energies = None
self.states = None
self.gs_energy = None
self.gs_state = None
self.N = N # Number of qubits
class MyHamiltonian0(Hamiltonian):
def __init__(self, N, A, b, P=1, flag0=False):
n = N // P
super().__init__(N)
self.flag0 = flag0
self.P = P
self.n = n
self.A = A
self.b = b
self.y = self.A @ self.b.reshape(-1, 1)
self.y = np.matrix(self.y)
self.set_hamiltonian()
def get_evec(self):
return self.noise_std * np.random.normal(size=self.n, requires_grad=False)
def set_hamiltonian(self):
def func1():
param1, param2 = self.A.shape
w1 = np.zeros((param2, param2))
w2 = np.zeros(param2)
for m in range(param1):
for i in range(param2):
w2[i] += -2 * self.A[m, i] * self.y[m, 0]
for j in range(param2):
w1[i, j] += self.A[m, i] * self.A[m, j]
return w1, w2
def func2():
w1, w2 = func1()
min_val = -2 ** (self.P - 1) + 1
param1, param2 = w1.shape
v1 = np.zeros((param2 * self.P, param2 * self.P))
v2 = np.zeros(param2 * self.P)
for i in range(param1):
for s in range(self.P):
v2[self.P * i + s] += (2 ** s) * w2[i]
for j in range(param2):
v2[self.P * i + s] += (2 ** s) * 2 * min_val * w1[i, j]
for p in range(self.P):
v1[self.P * i + s, self.P * j + p] += (2 ** (s + p)) * w1[i, j]
return v1, v2
v1, v2 = func2()
H = qml.Hamiltonian([], [])
for i in range(self.n):
for s in range(self.P):
xadded = False
x = i * self.P + s
fact = - (sum(v1[x, :]) + sum(v1[:, x]) + 2*v2[x])
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x)
for j in range(self.n):
for p in range(self.P):
y = j*self.P + p
fact = v1[x, y]
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x) @ qml.PauliZ(y)
if not xadded:
H += 0.0 * qml.PauliZ(x)
if self.flag0:
x = i * self.P
H += -(1/2)*qml.PauliZ(x) + (1/4)*qml.PauliZ(x) @ qml.PauliZ(x)
self.ham = H
def main():
tstart = time.time()
logging.info('Running experiment0')
cfg = Params0()
cfg.ensemble_num = 10
P = 1
n = 21
cfg.num_of_qubits = n * P
cfg.max_iterations = 10
cfg.experiment_name = 'experiment0'
cfg.hamiltonian_type = 'Hamiltonian0'
A = np.random.randn(n, n)
b = np.random.randint(2 ** P, size=n, requires_grad=False)
flag0 = False
res_vec = []
H = MyHamiltonian0(cfg.num_of_qubits, A=A, b=b, P=P, flag0=flag0)
logging.info("Starting the optimization...")
gs_energy, __ = runtest(H.ham, cfg, show=False)
res_vec.append(gs_energy)
logging.info("Optimization finished.")
logging.info("TIME= %d [sec]", int(np.round((time.time() - tstart))))
logging.info("Ground state energy: %s", res_vec)
return
# -------------------------------------
if __name__ == '__main__':
main()
and the output:
(pyenv) (base) cuquantum@7806514dd020:~$ python Gilad_Test.py
2024-05-28 05:48:21,924 Running experiment0
2024-05-28 05:48:26,198 Starting the optimization...
2024-05-28 05:48:26,788 Using device: Lightning GPU PennyLane plugin
Short name: lightning.gpu
Package: pennylane_lightning
Plugin version: 0.36.0
Author: Xanadu Inc.
Wires: 21
Shots: None
2024-05-28 05:48:26,789 CUDA is not available. Using CPU.
2024-05-28 05:48:26,790 CUDA check details:
2024-05-28 05:48:26,790 CUDA available: False
2024-05-28 05:48:26,790 CUDA device count: 0
I built the torch from source and now I get:
(cuquantum-24.03) cuquantum@7806514dd020:~$ python Gilad_Test.py
2024-05-28 07:11:14,614 Running experiment0
2024-05-28 07:11:18,819 Starting the optimization...
2024-05-28 07:11:19,331 Using device: Lightning GPU PennyLane plugin
Short name: lightning.gpu
Package: pennylane_lightning
Plugin version: 0.36.0
Author: Xanadu Inc.
Wires: 21
Shots: None
2024-05-28 07:11:19,332 CUDA is available. Using GPU.
But it doesn't seem the it's really working...
I tried also updating the code:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Specify the index of the GPU you want to use
import time
import logging
import torch
import pennylane as qml
from matplotlib import pyplot as plt
from pennylane import numpy as np
# Define the directory to save outputs
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Set up logging
log_file_path = os.path.join(output_dir, "output_log.txt")
logging.basicConfig(filename=log_file_path, level=logging.INFO,
format='%(asctime)s %(message)s', filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
def circuit0_basic(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires=wires[0])
def circuit0(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires[wires[0]])
def runtest(H, cfg, show=False):
max_iterations = cfg.max_iterations
num_qubits = len(H.wires)
num_param_sets = (2 ** num_qubits) - 1
# Initialize parameters directly on the GPU
params = torch.tensor(np.random.uniform(low=-np.pi / 2, high=np.pi / 2, size=(num_param_sets, 3)), requires_grad=True, device='cuda', dtype=torch.float32)
dev = qml.device("lightning.gpu", wires=num_qubits, batch_obs=True)
logging.info(f"Using device: {dev}")
@qml.qnode(dev, interface='torch')
def cost_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.expval(H)
@qml.qnode(dev, interface='torch')
def state_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.state()
# Check if GPU is available and log it after device is set
if torch.cuda.is_available():
logging.info("CUDA is available. Using GPU.")
else:
logging.info("CUDA is not available. Using CPU.")
logging.info("CUDA check details:")
logging.info(f"CUDA available: {torch.cuda.is_available()}")
logging.info(f"CUDA device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
logging.info(f"CUDA device name: {torch.cuda.get_device_name(0)}")
opt = torch.optim.Adam([params], lr=0.1)
conv_tol = 1e-06
energy_plot = []
prev_energy = cost_fn(params).item()
for n in range(max_iterations):
opt.zero_grad()
energy = cost_fn(params)
energy.backward()
opt.step()
energy = energy.item()
logging.info(f"Iteration {n + 1}/{max_iterations}: Energy = {energy:.6f}")
energy_plot.append(energy)
if np.abs(energy - prev_energy) <= conv_tol:
logging.info("Convergence reached!")
break
prev_energy = energy
# Print progress
progress = (n + 1) / max_iterations * 100
logging.info(f"Progress: {progress:.2f}%")
logging.info("Optimization completed.")
# Using the params, find the ground state vector
best_params = params
ground_state = state_fn(best_params)
# Plot energies
plt.clf()
plt.plot(energy_plot)
plt.xlabel("Iterations")
plt.ylabel("Energy")
plt.title("Energy at each iteration")
energy_plot_path = os.path.join(output_dir, "energy_plot.png")
plt.savefig(energy_plot_path)
if show:
plt.show()
return energy, ground_state
class Params0:
pass
class Hamiltonian:
def __init__(self, N):
self.ham = qml.Hamiltonian([], [])
self.energies = None
self.states = None
self.gs_energy = None
self.gs_state = None
self.N = N # Number of qubits
class MyHamiltonian0(Hamiltonian):
def __init__(self, N, A, b, P=1, flag0=False):
n = N // P
super().__init__(N)
self.flag0 = flag0
self.P = P
self.n = n
self.A = A.to(torch.float32).to('cuda') # Move to GPU and ensure float32 dtype
self.b = b.to(torch.float32).to('cuda') # Move to GPU and ensure float32 dtype
self.y = self.A @ self.b.reshape(-1, 1)
self.y = self.y.clone().detach().requires_grad_(True) # Properly construct tensor from existing tensor
self.set_hamiltonian()
def get_evec(self):
return self.noise_std * torch.normal(mean=0, std=1, size=(self.n,), device='cuda', requires_grad=False)
def set_hamiltonian(self):
def func1():
param1, param2 = self.A.shape
w1 = torch.zeros((param2, param2), device='cuda', dtype=torch.float32)
w2 = torch.zeros(param2, device='cuda', dtype=torch.float32)
for m in range(param1):
for i in range(param2):
w2[i] += -2 * self.A[m, i] * self.y[m, 0]
for j in range(param2):
w1[i, j] += self.A[m, i] * self.A[m, j]
return w1, w2
def func2():
w1, w2 = func1()
min_val = -2 ** (self.P - 1) + 1
param1, param2 = w1.shape
v1 = torch.zeros((param2 * self.P, param2 * self.P), device='cuda', dtype=torch.float32)
v2 = torch.zeros(param2 * self.P, device='cuda', dtype=torch.float32)
for i in range(param1):
for s in range(self.P):
v2[self.P * i + s] += (2 ** s) * w2[i]
for j in range(param2):
v2[self.P * i + s] += (2 ** s) * 2 * min_val * w1[i, j]
for p in range(self.P):
v1[self.P * i + s, self.P * j + p] += (2 ** (s + p)) * w1[i, j]
return v1, v2
v1, v2 = func2()
H = qml.Hamiltonian([], [])
for i in range(self.n):
for s in range(self.P):
xadded = False
x = i * self.P + s
fact = - (sum(v1[x, :]) + sum(v1[:, x]) + 2*v2[x])
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x)
for j in range(self.n):
for p in range(self.P):
y = j*self.P + p
fact = v1[x, y]
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x) @ qml.PauliZ(y)
if not xadded:
H += 0.0 * qml.PauliZ(x)
if self.flag0:
x = i * self.P
H += -(1/2)*qml.PauliZ(x) + (1/4)*qml.PauliZ(x) @ qml.PauliZ(x)
self.ham = H
def main():
tstart = time.time()
logging.info('Running experiment0')
cfg = Params0()
cfg.ensemble_num = 10
P = 1
n = 21 # Increase number of qubits
cfg.num_of_qubits = n * P
cfg.max_iterations = 100 # Increase the number of iterations for better GPU utilization
cfg.experiment_name = 'experiment0'
cfg.hamiltonian_type = 'Hamiltonian0'
A = torch.randn(n, n, device='cuda', dtype=torch.float32) # Move to GPU and ensure float32 dtype
b = torch.randint(2 ** P, size=(n,), device='cuda', dtype=torch.float32, requires_grad=False) # Move to GPU and ensure float32 dtype
flag0 = False
res_vec = []
H = MyHamiltonian0(cfg.num_of_qubits, A=A, b=b, P=P, flag0=flag0)
logging.info("Starting the optimization...")
gs_energy, __ = runtest(H.ham, cfg, show=False)
res_vec.append(gs_energy)
logging.info("Optimization finished.")
logging.info("TIME= %d [sec]", int(np.round((time.time() - tstart))))
logging.info("Ground state energy: %s", res_vec)
return
# -------------------------------------
if __name__ == '__main__':
main()
But the GPU Memory Usage is very low:
I think this is a MIG issue, trying to figure it out.
If you try swapping the device for default.qubit
and use PyTorch with CUDA-mapped tensors, does the GPU work? You can likely pick a smaller scale workload for this (e.g. something from the Torch GPU tests at https://github.com/PennyLaneAI/pennylane/blob/59a1e0586e707d057a0c92d4239036afa5312b73/tests/interfaces/test_torch.py#L399).
If this runs on the GPU without issue, it may be a runtime issue with LGPU. If not, then most likely the MIG/some CUDA driver issue on the node.
We fixed the MIG issue and now the following code works, I'm trying to find a way to accelerate the program because I feel like I'm not utilizing the entire GH200. If you know more ways to accelerate, that would help me a lot. code:
from mpi4py import MPI
import pennylane as qml
from matplotlib import pyplot as plt
from pennylane import numpy as np
import time
def circuit0_basic(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires=wires[0])
def circuit0(params, wires):
n_qubits = len(wires)
n_rotations = len(params)
if n_rotations > 1:
n_layers = n_rotations // n_qubits
n_extra_rots = n_rotations - n_layers * n_qubits
for layer_idx in range(n_layers):
layer_params = params[layer_idx * n_qubits: layer_idx * n_qubits + n_qubits, :]
qml.broadcast(qml.Rot, wires, pattern="single", parameters=layer_params)
qml.broadcast(qml.CNOT, wires, pattern="ring")
extra_params = params[-n_extra_rots:, :]
extra_wires = wires[: n_qubits - 1 - n_extra_rots: -1]
qml.broadcast(qml.Rot, extra_wires, pattern="single", parameters=extra_params)
else:
qml.Rot(*params[0], wires=wires[0])
def runtest(H, cfg, show=False):
max_iterations = cfg.max_iterations
num_qubits = len(H.wires)
num_param_sets = (2 ** num_qubits) - 1
params = np.random.uniform(low=-np.pi / 2, high=np.pi / 2, size=(num_param_sets, 3))
params = np.array(params, requires_grad=True)
# Enable state access by setting shots=None
dev = qml.device("lightning.gpu", wires=num_qubits, shots=None, batch_obs=True, mpi = True)
@qml.qnode(dev, diff_method="adjoint")
def cost_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.expval(H)
@qml.qnode(dev, diff_method="adjoint")
def state_fn(params):
circuit0_basic(params, wires=H.wires)
return qml.state()
opt = qml.AdamOptimizer(stepsize=0.1)
conv_tol = 1e-06
energy_plot = []
prev_energy = cost_fn(params)
for n in range(max_iterations):
params, energy = opt.step_and_cost(cost_fn, params)
print("Energy for iteration " + str(n) + " : " + str(energy))
energy_plot.append(energy)
if np.abs(energy - prev_energy) <= conv_tol:
break
prev_energy = energy
# Using the params, find the ground state vector
best_params = params
ground_state = state_fn(best_params)
# Plot energies
plt.clf()
plt.plot(energy_plot)
plt.xlabel("Iterations")
plt.ylabel("Energy")
plt.title("Energy at each iteration")
plt.savefig("energy_plot.png")
if show:
plt.show()
return energy, ground_state
class Params0:
pass
class Hamiltonian:
def __init__(self, N):
self.ham = qml.Hamiltonian([], [])
self.energies = None
self.states = None
self.gs_energy = None
self.gs_state = None
self.N = N # Numer of qubits
class MyHamiltonian0(Hamiltonian):
def __init__(self, N, A, b, P=1, flag0=False):
n = N // P
super().__init__(N)
self.flag0 = flag0
self.P = P
self.n = n
self.A = A
self.b = b
self.y = self.A @ self.b.reshape(-1,1)
self.y = np.matrix(self.y)
self.set_hamiltonian()
def get_evec(self):
return self.noise_std * np.random.normal(size=self.n, requires_grad=False)
def set_hamiltonian(self):
def func1():
param1, param2 = self.A.shape
w1 = np.zeros((param2, param2))
w2 = np.zeros(param2)
for m in range(param1):
for i in range(param2):
w2[i] += -2 * self.A[m, i] * self.y[m, 0]
for j in range(param2):
w1[i, j] += self.A[m, i] * self.A[m, j]
return w1, w2
def func2():
w1, w2 = func1()
min_val = -2 ** (self.P - 1) + 1
param1, param2 = w1.shape
v1 = np.zeros((param2 * self.P, param2 * self.P))
v2 = np.zeros(param2 * self.P)
for i in range(param1):
for s in range(self.P):
v2[self.P * i + s] += (2 ** s) * w2[i]
for j in range(param2):
v2[self.P * i + s] += (2 ** s) * 2 * min_val * w1[i, j]
for p in range(self.P):
v1[self.P * i + s, self.P * j + p] += (2 ** (s + p)) * w1[i, j]
return v1, v2
v1, v2 = func2()
H = qml.Hamiltonian([], [])
for i in range(self.n):
for s in range(self.P):
xadded = False
x = i * self.P + s
fact = - (sum(v1[x, :]) + sum(v1[:, x]) + 2*v2[x])
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x)
for j in range(self.n):
for p in range(self.P):
y = j*self.P + p
fact = v1[x, y]
if fact != 0:
xadded = True
H += fact * qml.PauliZ(x) @ qml.PauliZ(y)
if not xadded:
H += 0.0 * qml.PauliZ(x)
if self.flag0:
x = i * self.P
H += -(1/2)*qml.PauliZ(x) + (1/4)*qml.PauliZ(x) @ qml.PauliZ(x)
self.ham = H
def main():
tstart = time.time()
print('Running experiment0')
cfg = Params0()
cfg.ensemble_num = 10
P = 1
n = 21
cfg.num_of_qubits = n * P
cfg.max_iterations = 10
cfg.experiment_name = 'experiment0'
cfg.hamiltonian_type = 'Hamiltonian0'
A = np.random.randn(n, n)
b = np.random.randint(2 ** P, size=n, requires_grad=False)
flag0 = False
res_vec = []
H = MyHamiltonian0(cfg.num_of_qubits, A=A, b=b, P=P, flag0=flag0)
gs_energy, __ = runtest(H.ham, cfg, show=False)
res_vec.append(gs_energy)
print("TIME=", int(np.round((time.time() - tstart))), " [sec]")
print(res_vec)
return
# -------------------------------------
if __name__ == '__main__':
main()
Hi @giladqm,
Accelerating programs is an art. You're already doing the best you can by using lightning and adjoint. You may try other tricks like changing the default values in some of the keyword arguments in the QNode, but you probably will only get minor improvements if any. You could also try changing the optimizer to see if this helps. The main issue here is that you're using over 6 million parameters. This is a lot so it's natural for your program to be slow.
If you're noticing that your GPU usage isn't 100% it's probably because your bottleneck is on the CPU side of things. I'm guessing this is also related to the number of parameters that you have.
In other cases something like circuit cutting might help but in your case you have so many CNOTs that it probably won't help.
Feel free to explore the PennyLane Discussion Forum to see what others have tried to accelerate their programs too.
Thanks @CatalinaAlbornoz. One thing I find really weird is that for the same code where n=7 ,default.qubit takes 4 seconds but with lightning.gpu it takes 100-140 seconds. What is the reason for this?
Hi @giladqm, lightning.gpu is optimized to work with over 20 qubits. There's a big overhead in spinning up all of the processes needing and passing the information from CPU to GPU and viceversa. So for smaller circuits default.qubit or lightning.qubit will work better.
If you go to pennylane.ai/performance you'll notice that at the end of the page we have a table to help you choose between the simulators depending on your circuit. Definitely take a look at this page, you may find some new insights. 😃
I appreciate the explanation, Thank you @CatalinaAlbornoz
@CatalinaAlbornoz is there somewhere I can read more about the types of devices?
Hello, I want to test my GH200 (Grace-Hopper by Nvidia) by executing a simulation of a "heavy-weight" quantum algorithm with multiple qubits, and I thought PennyLane-Lightning could be a great tool for this. I want to run the simulation via a docker, but I a saw you have archived the repo of [pennylane-lightning-gpu]. So what do you recommend I should do?