NVIDIA / cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples
https://docs.nvidia.com/cuda/cuquantum/
BSD 3-Clause "New" or "Revised" License
320 stars 63 forks source link

noisy circuit simulation using cuquantum #51

Closed kaarthikvarma closed 10 months ago

kaarthikvarma commented 1 year ago

Hi I am trying to simulate a 30 qubit noisy circuit using the NVIDIA cuQuantum Appliance - nvcr.io/nvidia/cuquantum-appliance:22.11.

I encounter the error: "CUDA error: an illegal memory access was encountered vector_mgpu.h 129"

Does cuQuantum support noisy circuit simulations?

tlubowe commented 1 year ago

Hi @kaarthikvarma, can you share more about which backend in the appliance you are using?

kaarthikvarma commented 1 year ago

Thank you for the response! So I run the container with "docker run --gpus all -it --rm nvcr.io/nvidia/cuquantum-appliance:22.11" I don't change anything within the container I have the NVIDIA Driver Version 525.85.12 and CUDA version 12.0

leofang commented 1 year ago

Hi @kaarthikvarma Would it be possible to share a reproducer with us to investigate further into this issue? Also, could you try running your code with the latest 23.03 container and see if you encounter any issue? We fixed a few issues on the container offering, which may or may not be relevant for you (hard to judge without a reproducer). Thanks 🙂

kaarthikvarma commented 1 year ago

Hi Sorry for the delay! I tried running on 23.03 container but found the same error. This is the reproducer code that gives the same CUDA error:

`import numpy as np import cirq import qsimcirq

qubits = cirq.GridQubit.rect(5, 6)

gpu_options = qsimcirq.QSimOptions(gpu_mode = 8,max_fused_gate_size = 4)

qsim_simulator = qsimcirq.QSimSimulator(qsim_options=gpu_options)

circuit = cirq.Circuit() circuit.append([cirq.X(qubits[k]) for k in range(30)]) circuit.append([cirq.depolarize(p = 0.1)(qubits[k]) for k in range(30)]) circuit.append([cirq.measure(qubits[k]) for k in range(30)])

result = qsim_simulator.run(circuit,repetitions=200)`

Thanks!

leofang commented 1 year ago

Hi @kaarthikvarma thanks for your reproducer, it did make it easier for us to reason about the issue. We should be able to fix it in our next container release.

leofang commented 11 months ago

Just wanna keep everyone posted: We're still working on the 23.06 cuQuantum Appliance container release which will include the needed bug fix.

mtjrider commented 10 months ago

@kaarthikvarma we've published the 23.06 cuQuantum Appliance container on NGC, here.

You may pull it with:

docker pull nvcr.io/nvidia/cuquantum-appliance:23.06
bramathon commented 10 months ago

Hi, I've hit a similar issue using the 23.06 cuquantum appliance.

I'm using the ghz.py script found in the examples folder.

First, I test that I am able to run the example script:

(cuquantum-23.06) cuquantum@5c71780f490b:~/examples$ python ghz.py 
q(0),q(1),q(2)=110, 110, 111

Next, I add a single line to the script and saved it as noisy-ghz.py

def main(nqubits=28, nrepetitions=10, ngpus=1):
    measure = True if nrepetitions > 0 else False
    circuit = make_ghz_circuit(nqubits, measure=measure)
    **circuit = circuit.with_noise(cirq.depolarize(p=0.01))**

Running this script gives the error:

(cuquantum-23.06) cuquantum@5c71780f490b:~/examples$ python noisy-ghz.py 
custatevec error: internal error statespace_mgpu.h 685
mtjrider commented 10 months ago

@bramathon can you share the full modified example?

bramathon commented 10 months ago

Thanks for the quick response @mtjrider .

Here is the full script:

# Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import argparse
import cirq
import qsimcirq

parser = argparse.ArgumentParser(description='GHZ circuit')
parser.add_argument('--nqubits', type=int, default=3, help='the number of qubits in the circuit')
parser.add_argument('--nsamples', type=int, default=3, help='the number of samples to take')
parser.add_argument('--ngpus', type=int, default=1, help='the number of GPUs to use')

def create_qsim_options(
    max_fused_gate_size=2,
    disable_gpu=False,
    cpu_threads=1,
    gpu_mode=(0,),
    verbosity=0,
    n_subsvs=-1,
    use_sampler=None,
    debug=False
):
    return qsimcirq.QSimOptions(
        max_fused_gate_size=max_fused_gate_size,
        disable_gpu=disable_gpu,
        cpu_threads=cpu_threads,
        gpu_mode=gpu_mode,
        verbosity=verbosity,
        n_subsvs=n_subsvs,
        use_sampler=use_sampler,
        debug=debug
    )

def qsim_options_from_arguments(ngpus):
    if ngpus > 1:
        return create_qsim_options(gpu_mode=ngpus)
    elif ngpus == 1:
        return create_qsim_options()
    elif ngpus == 0:
        return create_qsim_options(disable_gpu=True, gpu_mode=0, use_sampler=False)

def make_ghz_circuit(nqubits, measure=False):
    qubits = cirq.LineQubit.range(nqubits)
    circuit = cirq.Circuit()
    circuit.append(cirq.H(qubits[0]))
    circuit.append(cirq.CNOT(qubits[idx], qubits[idx + 1]) for idx in range(nqubits - 1))
    if measure:
        circuit.append(cirq.measure(*qubits))
    return circuit

def main(nqubits=28, nrepetitions=10, ngpus=1):
    measure = True if nrepetitions > 0 else False
    circuit = make_ghz_circuit(nqubits, measure=measure)
    circuit = circuit.with_noise(cirq.depolarize(p=0.01))
    qsim_options = qsim_options_from_arguments(ngpus)
    simulator = qsimcirq.QSimSimulator(qsim_options=qsim_options)
    if nrepetitions > 0:
        results = simulator.run(circuit, repetitions=nrepetitions)
    else:
        results = simulator.simulate(circuit)
    print(results)

if __name__ == '__main__':
    args = parser.parse_args()
    main(nqubits=args.nqubits, nrepetitions=args.nsamples, ngpus=args.ngpus)
mtjrider commented 10 months ago

Thanks!

@bramathon can you tell me what system/GPUs you're using?

bramathon commented 10 months ago

I may be conflating multiple issues. On first run the ghz.py example works. However, if I try to run it again I get the following trace:

Traceback (most recent call last):
  File "/home/cuquantum/examples/ghz.py", line 72, in <module>
    main(nqubits=args.nqubits, nrepetitions=args.nsamples, ngpus=args.ngpus)
  File "/home/cuquantum/examples/ghz.py", line 64, in main
    results = simulator.run(circuit, repetitions=nrepetitions)
  File "/home/cuquantum/conda/envs/cuquantum-23.06/lib/python3.9/site-packages/cirq/work/sampler.py", line 63, in run
    return self.run_sweep(program, param_resolver, repetitions)[0]
  File "/home/cuquantum/conda/envs/cuquantum-23.06/lib/python3.9/site-packages/cirq/sim/simulator.py", line 72, in run_sweep
    return list(self.run_sweep_iter(program, params, repetitions))
  File "/home/cuquantum/conda/envs/cuquantum-23.06/lib/python3.9/site-packages/cirq/sim/simulator.py", line 103, in run_sweep_iter
    records = self._run(
  File "/home/cuquantum/conda/envs/cuquantum-23.06/lib/python3.9/site-packages/qsimcirq/qsim_simulator.py", line 324, in _run
    return self._sample_measure_results(solved_circuit, repetitions)
  File "/home/cuquantum/conda/envs/cuquantum-23.06/lib/python3.9/site-packages/qsimcirq/qsim_simulator.py", line 445, in _sample_measure_results
    results[key][:, i, :] = full_results[:, meas_indices] ^ invert_mask
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

However, the custatevec error: internal error statespace_mgpu.h 685 occurs on the first run as well as subsequent runs.

@bramathon can you tell me what system/GPUs you're using?

Quick dump of my system information:

Client: Docker Engine - Community
 Cloud integration: 1.0.17
 Version:           24.0.5
bevert@RM-LUBU-F2LPE2E:~$ nvidia-smi
Fri Sep  1 19:09:17 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P520                    Off | 00000000:2D:00.0 Off |                  N/A |
| N/A   39C    P8              N/A / ERR! |      4MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3208      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+
bevert@RM-LUBU-F2LPE2E:~$ lspci
00:00.0 Host bridge: Intel Corporation Comet Lake-U v1 4c Host Bridge/DRAM Controller (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation CometLake-U GT2 [UHD Graphics] (rev 02)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0c)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Comet Lake Thermal Subsytem
00:14.0 USB controller: Intel Corporation Comet Lake PCH-LP USB 3.1 xHCI Host Controller
00:14.2 RAM memory: Intel Corporation Comet Lake PCH-LP Shared SRAM
00:14.3 Network controller: Intel Corporation Comet Lake PCH-LP CNVi WiFi
00:16.0 Communication controller: Intel Corporation Comet Lake Management Engine Interface
00:1c.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #1 (rev f0)
00:1c.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #9 (rev f0)
00:1d.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #13 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Comet Lake PCH-LP LPC Premium Controller/eSPI Controller
00:1f.3 Audio device: Intel Corporation Comet Lake PCH-LP cAVS
00:1f.4 SMBus: Intel Corporation Comet Lake PCH-LP SMBus Host Controller
00:1f.5 Serial bus controller: Intel Corporation Comet Lake SPI (flash) Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (10) I219-V
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
03:00.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
04:01.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
04:02.0 PCI bridge: Intel Corporation JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016] (rev 01)
05:00.0 System peripheral: Intel Corporation JHL6240 Thunderbolt 3 NHI (Low Power) [Alpine Ridge LP 2016] (rev 01)
2b:00.0 USB controller: Intel Corporation JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016] (rev 01)
2d:00.0 3D controller: NVIDIA Corporation GP108GLM [Quadro P520] (rev a1)
2e:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN750 / PC SN730 NVMe SSD
bevert@RM-LUBU-F2LPE2E:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
mtjrider commented 10 months ago

Thanks for the detailed information. P520 is built with the Pascal architecture. We only support Volta and newer. We document this here.

I'm actually surprised the code runs at all. I did run your modified example on a DGX A100 (all 8 GPUs) and confirmed it works. In the test, I used --nsamples 10 and added a line to print the circuit. I've attached the output.

cuquantum-23.06-noisy-ghz-output.txt

bramathon commented 10 months ago

Ah, I will go about acquiring some better GPUs in that case.

Thank you very much for you help.

mtjrider commented 10 months ago

@bramathon closing this issue as the posted problem appears to be fully addressed. If you have other questions, issues, etc., please feel free to open a new issue/discussion referencing this issue.