Open taalexander opened 4 months ago
Posting this one as well. It doesn't error but takes very long to evaluate to the point I stopped at 200 operations and gave up (20 worked)
n_qubits = self.backend.num_qubits
n_comparisons = 200
n_measures_per_round = n_qubits - 1
qr = QuantumRegister(n_qubits)
# Store each bit in a register to avoid packing in the compiler
# in an effort to make less LLVM optimization and more
# computations happen
crs = [ClassicalRegister(1) for _ in range(n_comparisons)]
final_cr = ClassicalRegister(1)
qc_expr_test = QuantumCircuit(qr, *crs, final_cr)
# Should measure all 1s except the final qubit
qc_expr_test.x(qr[:-1])
measurements_remaining = n_comparisons
while (measurements_remaining > 0):
total_measures_performed = n_comparisons - measurements_remaining
measures_this_round = min(measurements_remaining, n_measures_per_round)
qc_expr_test.barrier(qr)
qc_expr_test.measure(qr[:measures_this_round], [crs[cr][0] for cr in range(total_measures_performed, total_measures_performed+measures_this_round)])
qc_expr_test.barrier(qr)
measurements_remaining = measurements_remaining - measures_this_round
# Complicated expression which should evaluate to 1
# Or together all results
# for a complicated expression
or_bit_expr = None
for cr in crs:
cbit = cr[0]
if or_bit_expr is None:
or_bit_expr = cbit
else:
tmp_expr = expr.bit_xor(cbit, or_bit_expr)
or_bit_expr = expr.bit_or(tmp_expr, tmp_expr)
# Should evaluate to all 1's ideally
# Map final classical result onto qubit to be measured
with qc_expr_test.if_test(or_bit_expr):
qc_expr_test.x(qr[-1])
For the first comment:
Thanks for the report, it's really good to be doing stress testing of this, and I really hadn't thought about 1000+ element expressions when I made the expression tree system, or wrote everything in terms of visitors. It's not immediately clear to me what the strategy should be here to fixing this for all expression-handling logic; anything that uses the visitors is going to be subject to Python's recursion limiter. Fwiw, I'd naively expect the recursion failure to occur at a little under 1500 nodes deep on Linux or macOS, and a little under 500 on Windows (the visitor uses two function calls for each node - Expr.accept
and a corresponding Visitor.visit
- and the default sys.getrecursionlimit()
is 3000 on Linux/macOS, 1000 on Windows iirc).
That said, the actual failure point here is a context-free visit, so I can pretty trivially reimplement that in terms of an arbitrary-order iterative node walker. The problem is that it'll likely still fail somewhere else down the line, in something like trying to compare circuit equality (which again tbf, I suppose could be context free if the arbitrary-order walker is deterministic, which ofc I would do). Then Aer also uses this visitor pattern to walk expressions (though I argued a small amount for it to linearise to some RPN stack evaluator internally, I don't think we went that way), so it'll choke attempting the simulation, and I think the OQ3 exporter does too. Rewriting the general visitor machinery into something like a deque-based iterative method is a larger task, and I guess probably needs a completely alternate visitor helper base (not impossible, just fiddly).
Fwiw, Python's built-in ast
has the exact same pattern with its NodeVisitor
(though they implement the visitor slightly differently), so at least on the positive side we've got company haha:
import ast
class Visit(ast.NodeVisitor):
pass
expr = ast.parse(" + ".join(["x"]*1500))
Visit().visit(expr)
[ ... snip ...]
File ~/python/3.10.6/lib/python3.10/ast.py:254, in iter_fields(node)
252 for field in node._fields:
253 try:
--> 254 yield field, getattr(node, field)
255 except AttributeError:
256 pass
RecursionError: maximum recursion depth exceeded while calling a Python object
How critical is this for your stress testing? Would the need for humongous binary expressions be mitigated by Qiskit including manual stores to memory and/or bitshift operations in its expression tree?
For the second comment:
Just to point out, this code block:
or_bit_expr = None
for cr in crs:
cbit = cr[0]
if or_bit_expr is None:
or_bit_expr = cbit
else:
tmp_expr = expr.bit_xor(cbit, or_bit_expr)
or_bit_expr = expr.bit_or(tmp_expr, tmp_expr)
produces an expression with something like $2^n - 1$ binary terms in it (or something like that - I kind of ball-parked the algebra).
Creating the expression should be fine, because that's linear - the way you've done it has explicit Python-object re-use. The problem for us comes in walking the expression; the visitor doesn't attempt to do on-the-fly subexpression elimination or anything, so we're now attempting to walk a $2^{200}$ node expression (something like $10^{60}$) to verify that it's safe to add to the circuit.
Questions for this comment:
Fundamentally, I think this giant expression just isn't going to be supported without allowing stores to memory though (which I'm working on, I promise) - if nothing else, we'd have to output it to OQ3 at some point.
Environment
What is happening?
While running a stress test for classical expressions I am running into a runtime recursion depth issue generating a circuit.
How can we reproduce the issue?
What should happen?
Should build the circuit successfully
Any suggestions?
No response