Closed DanBlackwell closed 5 months ago
Hi, thank you for raising this issue! We will take a closer look into why this process is being killed.
Just to confirm @DanBlackwell, when viewing the system logs, are you seeing OOM
logs when running for these programs?
To access these logs on a Braket NBI, they would be present at:
/var/log/messages
or on a your local mac at:
/var/log/syslog
The log line would read something similar to kernel: Out of memory: Killed process 8654 (python)
.
Initial resource allocation is defined by the number of qubits (14 in this case) in this line with [np.zeros(2**qubit_count, dtype=complex]
.
As a sanity check, the initial allocation seems to be low enough and should not cause an issue:
import numpy as np
initial_mat = np.zeros(2**14, dtype=complex)
initial_mat.nbytes
>>> 262144
This information will help us better debug the issue. Thank you!
I am running this in a Docker container so didn't have syslogs available unfortunately - I was guessing that it's OOM but couldn't check (I can just see the killed
message). I will take a look tomorrow morning as I am on european time.
As I increase the qubit numbers it gets progressively slower, until at 14 it was getting killed.
No problem! Based on what you're describing with increasing qubit count slowing down and eventually the process crashing I was able to reproduce a similar result. It did appear to be an OOM issue but not at initialization.
The team shall take a look into this issue and get back with findings.
Testing out your code, I was able to successfully execute it with:
from braket.circuits import Circuit
from braket.devices import LocalSimulator
device = LocalSimulator()
quantum_circuit = Circuit.from_ir('''
OPENQASM 3.0;
qubit[14] r1;
rz(1) r1;
''')
quantum_circuit.probability()
res = device.run(quantum_circuit, 0)
How are you importing stdgates.inc
?
Ah, you're right I forgot that it's doing that. I am using the original stdgates.inc
as defined here: https://arxiv.org/pdf/2104.14722.pdf. If I don't have the include statement then it's fine. Here's the full file listing that I was importing, perhaps something stands out to you:
// OpenQASM 3 standard gate library
// phase gate
gate p(λ) a { ctrl @ gphase(λ) a; }
// Pauli gate: bit-flip or NOT gate
gate x a { U(π, 0, π) a; }
// Pauli gate: bit and phase flip
gate y a { U(π, π/2, π/2) a; }
// Pauli gate: phase flip
gate z a { p(π) a; }
// Clifford gate: Hadamard
gate h a { U(π/2, 0, π) a; }
// Clifford gate: sqrt(Z) or S gate
gate s a { pow(1/2) @ z a; }
// Clifford gate: inverse of sqrt(Z)
gate sdg a { inv @ pow(1/2) @ z a; }
// sqrt(S) or T gate
gate t a { pow(1/2) @ s a; }
// inverse of sqrt(S)
gate tdg a { inv @ pow(1/2) @ s a; }
// sqrt(NOT) gate
gate sx a { pow(1/2) @ x a; }
// Rotation around X-axis
gate rx(θ) a { U(θ, -π/2, π/2) a; }
// rotation around Y-axis
gate ry(θ) a { U(θ, 0, 0) a; }
// rotation around Z axis
gate rz(λ) a { gphase(-λ/2); U(0, 0, λ) a; }
// controlled-NOT
gate cx c, t { ctrl @ x c, t; }
// controlled-Y
gate cy a, b { ctrl @ y a, b; }
// controlled-Z
gate cz a, b { ctrl @ z a, b; }
// controlled-phase
gate cp(λ) a, b { ctrl @ p(λ) a, b; }
// controlled-rx
gate crx(θ) a, b { ctrl @ rx(θ) a, b; }
// controlled-ry
gate cry(θ) a, b { ctrl @ ry(θ) a, b; }
// controlled-rz
gate crz(θ) a, b { ctrl @ rz(θ) a, b; }
// controlled-H
gate ch a, b { ctrl @ h a, b; }
// swap
gate swap a, b { cx a, b; cx b, a; cx a, b; }
// Toffoli
gate ccx a, b, c { ctrl @ ctrl @ x a, b, c; }
// controlled-swap
gate cswap a, b, c { ctrl @ swap a, b, c; }
// four parameter controlled-U gate with relative phase γ
gate cu(θ, ϕ, λ, γ) c, t { p(γ) c; ctrl @ U(θ, ϕ, λ) c, t; }
// Gates for OpenQASM 2 backwards compatibility
// CNOT
gate CX c, t { ctrl @ U(π, 0, π) c, t; }
// phase gate
gate phase(λ) q { U(0, 0, λ) q; }
// controlled-phase
gate cphase(λ) a, b { ctrl @ phase(λ) a, b; }
// identity or idle gate
gate id a { U(0, 0, 0) a; }
// IBM Quantum experience gates
gate u1(λ) q { U(0, 0, λ) q; }
gate u2(ϕ, λ) q { gphase(-(ϕ+λ)/2); U(π/2, ϕ, λ) q; }
gate u3(θ, ϕ, λ) q { gphase(-(ϕ+λ)/2); U(θ, ϕ, λ) q; }
Just a further note; if I bump it up to 23 qubits I get an allocation failure with a full stack trace:
This program uses OpenQASM language features that may not be supported on QPUs or on-demand simulators.
Traceback (most recent call last):
File "/python_js_test/braket_harness.py", line 12, in <module>
res = device.run(quantum_circuit, 0)
File "/usr/local/lib/python3.10/dist-packages/braket/devices/local_simulator.py", line 117, in run
result = self._run_internal(task_specification, shots, inputs=inputs, *args, **kwargs)
File "/usr/lib/python3.10/functools.py", line 926, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/braket/devices/local_simulator.py", line 276, in _
results = simulator.run(program, shots, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulator.py", line 149, in run
return self.run_openqasm(circuit_ir, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulator.py", line 462, in run_openqasm
simulation.evolve(operations)
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/state_vector_simulation.py", line 67, in evolve
self._state_vector = StateVectorSimulation._apply_operations(
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/state_vector_simulation.py", line 101, in _apply_operations
single_operation_strategy.apply_operations(state_tensor, qubit_count, operations)
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/simulation_strategies/single_operation_strategy.py", line 36, in apply_operations
matrix = operation.matrix
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/operation.py", line 58, in matrix
unitary = self._base_matrix
File "/usr/local/lib/python3.10/dist-packages/braket/default_simulator/gate_operations.py", line 1153, in _base_matrix
return cmath.exp(self._angle * 1j) * np.eye(2 ** len(self._targets))
File "/usr/local/lib/python3.10/dist-packages/numpy/lib/twodim_base.py", line 211, in eye
m = zeros((N, M), dtype=dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 512. TiB for an array with shape (8388608, 8388608) and data type float64
That latter exception is expected as numpy is attempting to allocate (2*23) qubits 16 bytes (2 * 64 bit numbers).
Braket uses a different implementation for the rz gate. Does your program work removing the stdgates include statement with higher qubit counts than 14?
Editing the answer:
@DanBlackwell I jumped straight to the error message without ready through carefully and you are correct! My math is for the initial space allocation. However the gphase space allocation is done with a float64
(the initial space allocation with np.zeros
uses a complex128
across a 2^n * 2^n
matrix, where n is the number of qubits.
That latter exception is expected as numpy is attempting to allocate 2*24qubits 16 bytes (2 * 64 bit numbers).
Ok, I see now (2^23)^2 * 2^4 = 2^(46 + 4) = 2^50
- but that would be 1024TB, so I guess that each element is only 2^3 (8) bytes then. My background is not in quantum sorry.
I'm trying to figure out why qiskit has no trouble with this though. Running the following (I created a distinct gate using gphase to make sure that there's not some hidden optimisation to recognise the rz
gate and run some different path):
from qiskit import transpile
from qiskit.qasm3 import loads
from qiskit.quantum_info import Statevector
from qiskit_aer import Aer
qc = loads('''
OPENQASM 3.0;
gate testgate(λ) a { gphase(λ); U(λ, 0, 0) a; }
qubit[23] r1;
testgate(1) r1;
''')
qc = transpile(qc, Aer.get_backend('qasm_simulator'))
state_vec = Statevector.from_instruction(qc)
print(state_vec.probabilities())
Uses only ~635MB memory (as reported by /usr/bin/time -v
), meanwhile braket is trying to allocate 512TiB for the same QASM program. For smaller numbers of qubits (I've tested upto 13) that complete successfully both simulators produce the same probabilities, so I don't think it's a bug in either approach.
The implementation of the global phase used in the decomposition of RZ is not efficient. Could you try pulling and using this branch where I have created a patch to see if it resolves your issue? https://github.com/amazon-braket/amazon-braket-default-simulator-python/tree/patch_gphase_exp
@shpface I just tested it; works perfectly thanks! (tested it upto 25 qubits without issue)
Describe the bug For a relatively low number of qubits, applying
rz
results in the process being killed. I can see from the source thatrz
has an exponential space requirement, but this seems too extreme; perhaps it's a symptom of something deeper.To reproduce NB I am running this on a macbook air with 8GB RAM - a beefier machine / different OS settings might breeze through 14 qubits.
Expected behavior The program completes successfully.
Screenshots or logs Here is the full output that I get:
System information A description of your system. Please provide:
Additional context N/A