amazon-braket / amazon-braket-default-simulator-python

Quantum program simulators that can run locally
https://amazon-braket-default-simulator-python.readthedocs.io/
Apache License 2.0
67 stars 21 forks source link

Using `rz` results in process being killed (OOM?) #243

Closed DanBlackwell closed 5 months ago

DanBlackwell commented 5 months ago

Describe the bug For a relatively low number of qubits, applying rz results in the process being killed. I can see from the source that rz has an exponential space requirement, but this seems too extreme; perhaps it's a symptom of something deeper.

To reproduce NB I am running this on a macbook air with 8GB RAM - a beefier machine / different OS settings might breeze through 14 qubits.

from braket.circuits import Circuit
from braket.devices import LocalSimulator

device = LocalSimulator()
quantum_circuit = Circuit.from_ir('''
OPENQASM 3.0;
include "stdgates.inc";
qubit[14] r1;
rz(1) r1;
''')
quantum_circuit.probability()
res = device.run(quantum_circuit, 0)

Expected behavior The program completes successfully.

Screenshots or logs Here is the full output that I get:

This program uses OpenQASM language features that may not be supported on QPUs or on-demand simulators.
Killed

System information A description of your system. Please provide:

Additional context N/A

ashlhans commented 5 months ago

Hi, thank you for raising this issue! We will take a closer look into why this process is being killed.

math411 commented 5 months ago

Just to confirm @DanBlackwell, when viewing the system logs, are you seeing OOM logs when running for these programs?

To access these logs on a Braket NBI, they would be present at:

/var/log/messages

or on a your local mac at:

/var/log/syslog

The log line would read something similar to kernel: Out of memory: Killed process 8654 (python).

Initial resource allocation is defined by the number of qubits (14 in this case) in this line with [np.zeros(2**qubit_count, dtype=complex].

As a sanity check, the initial allocation seems to be low enough and should not cause an issue:

import numpy as np
initial_mat = np.zeros(2**14, dtype=complex)
initial_mat.nbytes
>>> 262144

This information will help us better debug the issue. Thank you!

DanBlackwell commented 5 months ago

I am running this in a Docker container so didn't have syslogs available unfortunately - I was guessing that it's OOM but couldn't check (I can just see the killed message). I will take a look tomorrow morning as I am on european time.

As I increase the qubit numbers it gets progressively slower, until at 14 it was getting killed.

math411 commented 5 months ago

No problem! Based on what you're describing with increasing qubit count slowing down and eventually the process crashing I was able to reproduce a similar result. It did appear to be an OOM issue but not at initialization.

The team shall take a look into this issue and get back with findings.

math411 commented 5 months ago

Testing out your code, I was able to successfully execute it with:

from braket.circuits import Circuit
from braket.devices import LocalSimulator

device = LocalSimulator()
quantum_circuit = Circuit.from_ir('''
OPENQASM 3.0;
qubit[14] r1;
rz(1) r1;
''')
quantum_circuit.probability()
res = device.run(quantum_circuit, 0)

How are you importing stdgates.inc?

DanBlackwell commented 5 months ago

Ah, you're right I forgot that it's doing that. I am using the original stdgates.inc as defined here: https://arxiv.org/pdf/2104.14722.pdf. If I don't have the include statement then it's fine. Here's the full file listing that I was importing, perhaps something stands out to you:

// OpenQASM 3 standard gate library

// phase gate
gate p(λ) a { ctrl @ gphase(λ) a; }

// Pauli gate: bit-flip or NOT gate
gate x a { U(π, 0, π) a; }
// Pauli gate: bit and phase flip
gate y a { U(π, π/2, π/2) a; }
// Pauli gate: phase flip
gate z a { p(π) a; }

// Clifford gate: Hadamard
gate h a { U(π/2, 0, π) a; }
// Clifford gate: sqrt(Z) or S gate
gate s a { pow(1/2) @ z a; }
// Clifford gate: inverse of sqrt(Z)
gate sdg a { inv @ pow(1/2) @ z a; }

// sqrt(S) or T gate
gate t a { pow(1/2) @ s a; }
// inverse of sqrt(S)
gate tdg a { inv @ pow(1/2) @ s a; }

// sqrt(NOT) gate
gate sx a { pow(1/2) @ x a; }

// Rotation around X-axis
gate rx(θ) a { U(θ, -π/2, π/2) a; }
// rotation around Y-axis
gate ry(θ) a { U(θ, 0, 0) a; }
// rotation around Z axis
gate rz(λ) a { gphase(-λ/2); U(0, 0, λ) a; }

// controlled-NOT
gate cx c, t { ctrl @ x c, t; }
// controlled-Y
gate cy a, b { ctrl @ y a, b; }
// controlled-Z
gate cz a, b { ctrl @ z a, b; }
// controlled-phase
gate cp(λ) a, b { ctrl @ p(λ) a, b; }
// controlled-rx
gate crx(θ) a, b { ctrl @ rx(θ) a, b; }
// controlled-ry
gate cry(θ) a, b { ctrl @ ry(θ) a, b; }
// controlled-rz
gate crz(θ) a, b { ctrl @ rz(θ) a, b; }
// controlled-H
gate ch a, b { ctrl @ h a, b; }

// swap
gate swap a, b { cx a, b; cx b, a; cx a, b; }

// Toffoli
gate ccx a, b, c { ctrl @ ctrl @ x a, b, c; }
// controlled-swap
gate cswap a, b, c { ctrl @ swap a, b, c; }

// four parameter controlled-U gate with relative phase γ
gate cu(θ, ϕ, λ, γ) c, t { p(γ) c; ctrl @ U(θ, ϕ, λ) c, t; }

// Gates for OpenQASM 2 backwards compatibility
// CNOT
gate CX c, t { ctrl @ U(π, 0, π) c, t; }
// phase gate
gate phase(λ) q { U(0, 0, λ) q; }
// controlled-phase
gate cphase(λ) a, b { ctrl @ phase(λ) a, b; }
// identity or idle gate
gate id a { U(0, 0, 0) a; }
// IBM Quantum experience gates
gate u1(λ) q { U(0, 0, λ) q; }
gate u2(ϕ, λ) q { gphase(-(ϕ+λ)/2); U(π/2, ϕ, λ) q; }
gate u3(θ, ϕ, λ) q { gphase(-(ϕ+λ)/2); U(θ, ϕ, λ) q; }
DanBlackwell commented 5 months ago

Just a further note; if I bump it up to 23 qubits I get an allocation failure with a full stack trace:

This program uses OpenQASM language features that may not be supported on QPUs or on-demand simulators.
Traceback (most recent call last):
  File "/python_js_test/braket_harness.py", line 12, in <module>
    res = device.run(quantum_circuit, 0)
  File "/usr/local/lib/python3.10/dist-packages/braket/devices/local_simulator.py", line 117, in run
    result = self._run_internal(task_specification, shots, inputs=inputs, *args, **kwargs)
  File "/usr/lib/python3.10/functools.py", line 926, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/braket/devices/local_simulator.py", line 276, in _
    results = simulator.run(program, shots, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulator.py", line 149, in run
    return self.run_openqasm(circuit_ir, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulator.py", line 462, in run_openqasm
    simulation.evolve(operations)
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/state_vector_simulation.py", line 67, in evolve
    self._state_vector = StateVectorSimulation._apply_operations(
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/state_vector_simulation.py", line 101, in _apply_operations
    single_operation_strategy.apply_operations(state_tensor, qubit_count, operations)
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulation_strategies/single_operation_strategy.py", line 36, in apply_operations
    matrix = operation.matrix
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/operation.py", line 58, in matrix
    unitary = self._base_matrix
  File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/gate_operations.py", line 1153, in _base_matrix
    return cmath.exp(self._angle * 1j) * np.eye(2 ** len(self._targets))
  File "/usr/local/lib/python3.10/dist-packages/numpy/lib/twodim_base.py", line 211, in eye
    m = zeros((N, M), dtype=dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 512. TiB for an array with shape (8388608, 8388608) and data type float64
math411 commented 5 months ago

That latter exception is expected as numpy is attempting to allocate (2*23) qubits 16 bytes (2 * 64 bit numbers).

Braket uses a different implementation for the rz gate. Does your program work removing the stdgates include statement with higher qubit counts than 14?

Editing the answer: @DanBlackwell I jumped straight to the error message without ready through carefully and you are correct! My math is for the initial space allocation. However the gphase space allocation is done with a float64 (the initial space allocation with np.zeros uses a complex128 across a 2^n * 2^n matrix, where n is the number of qubits.

DanBlackwell commented 5 months ago

That latter exception is expected as numpy is attempting to allocate 2*24qubits 16 bytes (2 * 64 bit numbers).

Ok, I see now (2^23)^2 * 2^4 = 2^(46 + 4) = 2^50 - but that would be 1024TB, so I guess that each element is only 2^3 (8) bytes then. My background is not in quantum sorry.

I'm trying to figure out why qiskit has no trouble with this though. Running the following (I created a distinct gate using gphase to make sure that there's not some hidden optimisation to recognise the rz gate and run some different path):

from qiskit import transpile
from qiskit.qasm3 import loads
from qiskit.quantum_info import Statevector
from qiskit_aer import Aer

qc = loads('''
OPENQASM 3.0;
gate testgate(λ) a { gphase(λ); U(λ, 0, 0) a; }
qubit[23] r1;
testgate(1) r1;
''')
qc = transpile(qc, Aer.get_backend('qasm_simulator'))
state_vec = Statevector.from_instruction(qc)
print(state_vec.probabilities())

Uses only ~635MB memory (as reported by /usr/bin/time -v), meanwhile braket is trying to allocate 512TiB for the same QASM program. For smaller numbers of qubits (I've tested upto 13) that complete successfully both simulators produce the same probabilities, so I don't think it's a bug in either approach.

shpface commented 5 months ago

The implementation of the global phase used in the decomposition of RZ is not efficient. Could you try pulling and using this branch where I have created a patch to see if it resolves your issue? https://github.com/amazon-braket/amazon-braket-default-simulator-python/tree/patch_gphase_exp

DanBlackwell commented 5 months ago

@shpface I just tested it; works perfectly thanks! (tested it upto 25 qubits without issue)