Transpilation segmentation fault with large circuit

roberCO commented 3 years ago

Informations

Qiskit Aer version: 0.23.1
Qiskit version: 0.7.1
Python version: 3.7.4
Operating system: Linux Ubuntu 20.04 64 bits (x_86)

What is the current behavior?

The simulator gets an segmentation fault during the statevector calculation with the Aer simulator. The ram consumption is around 7 GB out of 32 GB available (it doesn't run out of RAM). The virtual memory usage is around 15GB.

Steps to reproduce the problem

It is necessary to execute the python code that I post below. This code has some configuration parameters. It is only necessary to modify the 'failure_parameter'. If I set the failure_parameter = 2 the code is ok (it prints statevector generated). However, if I set the failure_parameter = 3 (or greater), it gets a segmentation fault.

What is the expected behavior?

It should calculate the statevector (as it does with failure_parameter = 2). The failure_parameter is a variable to create a circuit (greater value, larger circuit). The problem is the length of the circuit (the qc.size() returns around 800.000 with failure_parameter = 3).

Suggested solutions

Important => With previous version of qiskit and aer there is no error.

Qiskit version: 0.19.6
Qiskit Aer version: 0.5.2

With these versions, the simulator DOESN'T get the error but it executes slower (like 3 times slower than current qiskit/Aer version). Monitored by htop, with 0.19.6/0.5.2 version consumes less RAM/virtual than the current version. So, it could be a memory management error.

Python code:

import numpy as np
from itertools import product
import math
import json

from qiskit import QuantumRegister, QuantumCircuit, execute, Aer
from qiskit.quantum_info import Statevector
from qiskit.aqua.components.oracles import Oracle, TruthTableOracle
from collections import OrderedDict

failure_parameter = 2

input_a = 2
input_b = 5
input_c = {}
input_d = 1000

with open('input_c.json', 'r') as json_file:
    input_c = json.load(json_file)

id_len = int(np.ceil(np.log2(input_a)))
n_ancilla_bits = 4
optimization = False
mct_mode = 'noancilla'

def preparation(circuit,value,id):

        circuit.h(value)
        if bin(input_a).count('1') == 1:
            circuit.h(id)
        else:
            vector = ([1]*input_a + [0]*(2**(id_len) - input_a))/np.sqrt(input_a)
            circuit.initialize(vector, [id[j] for j in range(id_len)])

def sum1(circuit,qubit_string,control,start,end):

        circuit.cx(control,end)
        circuit.x(start)
        circuit.cx(control,start) 

        for i in range(qubit_string.size,-1,-1):

            if i < qubit_string.size:
                circuit.mcx(control_qubits= [qubit_string[j] for j in range(i-1,-1,-1)]+[end], target_qubit = qubit_string[i])
            circuit.mcx(control_qubits = [qubit_string[j] for j in range(i-1,-1,-1)]+[end], target_qubit = start)

            if i == qubit_string.size:
                for j in range(i-1,-1,-1):
                    circuit.ccx(control,start,qubit_string[j])
                circuit.ccx(control,start,end)
            elif i == 0:
                circuit.mcx(control_qubits = [control,qubit_string[i],start], target_qubit = end)
            else:
                for j in range(i-1,-1,-1):            
                    circuit.mcx(control_qubits = [control,qubit_string[i],start], target_qubit = qubit_string[j])
                circuit.mcx(control_qubits = [control,qubit_string[i],start], target_qubit = end)
        circuit.x(start)

def sumsubtract1(circuit,qubit_string,control,start,end,move_value):

        circuit.cx(move_value,qubit_string)

        sum1(circuit,qubit_string,control,start,end)

        circuit.cx(move_value,qubit_string)

def conditional_move(circuit,ancilla,zero_one,move_value,move_id,value_vector):

        for i in range(input_a):
            value = value_vector[i] 
            value_index = np.binary_repr(i, width=id_len) 

            for j in range(len(value_index)):
                if value_index[j] == '0':
                    circuit.x(move_id[j])

            circuit.mcx(control_qubits= [zero_one[0]]+[move_id[j] for j in range(move_id.size)], target_qubit = ancilla[0])
            sumsubtract1(circuit,value,ancilla[0],ancilla[1],ancilla[2],move_value[0]) 
            circuit.mcx(control_qubits= [zero_one[0]]+[move_id[j] for j in range(move_id.size)], target_qubit = ancilla[0])   

            for j in range(len(value_index)):
                if value_index[j] == '0':
                    circuit.x(move_id[j])

def reflection(circuit,zero_one,move_value,move_id):

        circuit.x(move_id)
        circuit.x(move_value)
        circuit.x(zero_one)

        circuit.h(zero_one[0])
        circuit.mcx(control_qubits = [move_id[j] for j in range(id_len)]+ [move_value[0]], target_qubit = zero_one[0])
        circuit.h(zero_one[0])

        circuit.x(move_id)
        circuit.x(move_value)
        circuit.x(zero_one)

def prepare_initial_circuits_n():

        s_id = QuantumRegister(id_len) 
        s_value = QuantumRegister(1)

        sub_circ = QuantumCircuit(s_value,s_id)
        preparation(sub_circ,s_value,s_id)
        preparation_gate = sub_circ.to_instruction()

        s_value_vector = []
        for i in range(input_a):
            s_value_vector.append(QuantumRegister(input_b, name = 'value' + str(i)))
        s_id = QuantumRegister(id_len)
        s_value = QuantumRegister(1)
        s_zero_one = QuantumRegister(1)
        s_ancilla = QuantumRegister(n_ancilla_bits)

        sub_circ = QuantumCircuit(s_ancilla, s_zero_one, s_value, s_id)
        for i in range(input_a-1,-1,-1):
            sub_circ = sub_circ + QuantumCircuit(s_value_vector[i])

        conditional_move(sub_circ,s_ancilla, s_zero_one, s_value, s_id, s_value_vector)

        conditional_move_gate_n = sub_circ.to_instruction()

        s_id = QuantumRegister(id_len) 
        s_value = QuantumRegister(1)
        s_zero_one = QuantumRegister(1)

        sub_circ = QuantumCircuit(s_zero_one, s_value, s_id)
        reflection(sub_circ,s_zero_one, s_value, s_id)

        reflection_gate = sub_circ.to_instruction()

        return [preparation_gate, conditional_move_gate_n, reflection_gate]

[preparation_gate, conditional_move_gate_n, reflection_gate] = prepare_initial_circuits_n()

def zero_one(circuit,ancilla,zero_one):

        circuit.x(zero_one)
        for i in range(ancilla.size-1,-1,-1):
            circuit.cu(theta = -math.pi*2**(i-ancilla.size), phi  = 0, lam = 0, gamma = 0, control_qubit = ancilla[i], target_qubit = zero_one)

def zero_one_func_n(oracle):

        oracle.construct_circuit()
        oracle_circuit = oracle.circuit

        oracle_gate = oracle_circuit.to_instruction()

        cf_vector_value = []
        for i in range(input_a):
            cf_vector_value.append(QuantumRegister(input_b, name = 'value' + str(i)))
        cf_move_id = QuantumRegister(id_len) 
        cf_move_value = QuantumRegister(1)
        cf_zero_one = QuantumRegister(1)
        cf_ancilla = QuantumRegister(n_ancilla_bits)

        cf_circ = QuantumCircuit(cf_ancilla,cf_zero_one,cf_move_value,cf_move_id)
        for i in range(input_a-1,-1,-1):
            cf_circ = cf_circ + QuantumCircuit(cf_vector_value[i])

        cf_circ.append(oracle_gate, [cf_move_value[0]]+[cf_move_id[j] for j in range(cf_move_id.size)] + [cf_vector_value[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))] + [cf_ancilla[j] for j in range(n_ancilla_bits)])
        zero_one(cf_circ,cf_ancilla,cf_zero_one)
        cf_circ.append(oracle_gate.inverse(), [cf_move_value[0]]+[cf_move_id[j] for j in range(cf_move_id.size)]+[cf_vector_value[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))]+ [cf_ancilla[j] for j in range(n_ancilla_bits)])

        zero_one_gate = cf_circ.to_instruction()

        return zero_one_gate

def W_func_n(oracle):

    w_value_vector = []
    for i in range(input_a):
        w_value_vector.append(QuantumRegister(input_b, name = 'value' + str(i)))

    w_move_id = QuantumRegister(id_len, name = 'move_id')
    w_move_value = QuantumRegister(1, name = 'move_value')

    w_zero_one = QuantumRegister(1, name = 'zero_one')

    w_ancilla = QuantumRegister(n_ancilla_bits, name = 'ancilla')

    qc = QuantumCircuit(w_ancilla,w_zero_one,w_move_value,w_move_id)
    for i in range(input_a-1,-1,-1):
        qc = qc + QuantumCircuit(w_value_vector[i])

    zero_one_gate = zero_one_func_n(oracle)

    qc.append(preparation_gate, [w_move_value[0]]+[w_move_id[j] for j in range(id_len)])
    qc.append(zero_one_gate,  [w_ancilla[j] for j in range(n_ancilla_bits)]+[w_zero_one[0],w_move_value[0]]+ [w_move_id[j] for j in range(id_len)]+[w_value_vector[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))])
    qc.append(conditional_move_gate_n, [w_ancilla[j] for j in range(n_ancilla_bits)]+[w_zero_one[0],w_move_value[0]]+ [w_move_id[j] for j in range(id_len)]+[w_value_vector[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))])
    qc.append(zero_one_gate.inverse(),[w_ancilla[j] for j in range(n_ancilla_bits)]+[w_zero_one[0],w_move_value[0],]+ [w_move_id[j] for j in range(id_len)]+[w_value_vector[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))])
    qc.append(preparation_gate.inverse(), [w_move_value[0]]+[w_move_id[j] for j in range(id_len)])
    qc.append(reflection_gate, [w_zero_one[0],w_move_value[0]]+[w_move_id[j] for j in range(id_len)])

    W_gate = qc.to_instruction()

    return W_gate

def calculate_bitmap(deltas_dictionary, out_bits):

    new_bitmap = []
    vector_value = {}

    for key in deltas_dictionary.keys():

        if deltas_dictionary[key] >= 0:
            probability = math.exp(-input_d * deltas_dictionary[key])
        else: 
            probability = 1

        value = 1 - 2/math.pi * math.asin(math.sqrt(probability))

        value = np.minimum(value, 1-2**(-out_bits-1))
        if value == 1.:
            print('probability = ',probability)
            print('value',value)
            raise ValueError('Warning: value seems to be pi/2, and that should not be possible')

        vector_value[key] = np.binary_repr(int(value*2**out_bits), width= out_bits)

    vector_value = OrderedDict(sorted(vector_value.items()))

    new_bitmap = []
    for o in range(out_bits-1,-1,-1):
        string = ''
        for key in vector_value.keys():
            string += str(vector_value[key])[o]
        new_bitmap += [string]

    return new_bitmap

class Beta_precalc_TruthTableOracle(TruthTableOracle):

    def __init__(self, input_c, n_ancilla_bits, optimization, mct_mode):

        super().__init__(calculate_bitmap(input_c, n_ancilla_bits), optimization, mct_mode)

g_vector_value = []
for i in range(input_a):
    g_vector_value.append(QuantumRegister(input_b, name = 'value' + str(i)))

g_move_id = QuantumRegister(id_len, name = 'move_id')
g_move_value = QuantumRegister(1, name = 'move_value') 

g_zero_one = QuantumRegister(1, name = 'zero_one')

g_ancilla = QuantumRegister(n_ancilla_bits, name = 'ancilla')

qc = QuantumCircuit(g_ancilla,g_zero_one,g_move_value,g_move_id)
for i in range(input_a-1,-1,-1):
    qc = qc + QuantumCircuit(g_vector_value[i])

for g_value in g_vector_value:
    qc.h(g_value)

oracle = Beta_precalc_TruthTableOracle(input_c, n_ancilla_bits, optimization, mct_mode)

for i in range(failure_parameter):

    W_gate = W_func_n(oracle)

    qc.append(W_gate,  [g_ancilla[j] for j in range(n_ancilla_bits)] + [g_zero_one[0],g_move_value[0]]+ [g_move_id[j] for j in range(id_len)] +[g_vector_value[k][j] for (k,j) in product(range(input_a-1,-1,-1), range(input_b))])

print('calculating statevector...')

backend = Aer.get_backend('statevector_simulator')
experiment = execute(qc, backend, method='statevector')
state_vector = Statevector(experiment.result().get_statevector(qc))

print('statevector for input:', input_b, '-', failure_parameter, 'generated')

roberCO commented 3 years ago

UPDATE

I investigated the bug by myself and I got some answers.

The segmentation fault appears in https://github.com/Qiskit/qiskit-terra/blob/master/qiskit/transpiler/passes/analysis/depth.py#L23

After generating a FencedDAGCircuit, it executes pass_.run(fenced_dag) in https://github.com/Qiskit/qiskit-terra/blob/master/qiskit/transpiler/runningpassmanager.py#L177. It goes to depth.py (the above link) and it tries to get the attribute depth of the fenced dag but depth is not in the list of attributes, so it tries to access to a memory position out of the list, then segmentation fault.

As I explained in my previous post, it only occurs when the circuit is very deep (if it is shorter, it finds the depth of the fenced dag without problem). I leave a trace that I generated with old school debugging (with prints because I can't access to the log file).

-i- parallel.py:parallel_map() calling task... -i- parallel.py:parallel_map() calling task... -i- parallel.py:parallel_map() calling task... -i- execute.py:execute() transpiling... -i- transpile.py:transpile() calling parallel_map... -i- parallel.py:parallel_map() calling task... -i- transpile.py:_transpile_circuit calling pass_manager.run() -i- passmanager.py:run() calling self.run_single_circuit() -i- passmanager.py:_run_single_circuit() calling running_passmanger.run() -i- runningpassmanager.py:run() new passset -i- runningpassmanager.py:run() calling self._do_pass() -i- runningpassmanager.py:_do_pass calling _run_this_pass -i- runningpassmanager.py:_run_this_pass() executing transformation pass... -i- runningpassmanager.py:_run_this_pass() transformation pass EXECUTED!! -i- runningpassmanager.py:_do_pass dag calculated! -i- runningpassmanager.py:_do_pass calling _update_vaid_passes -i- runningpassmanager.py:_do_pass updated! -i- runningpassmanager.py:run() end self._do_pass() -i- runningpassmanager.py:run() calling self._do_pass() -i- runningpassmanager.py:_do_pass calling _run_this_pass -i- runningpassmanager.py:_run_this_pass() executing transformation pass... -i- runningpassmanager.py:_run_this_pass() transformation pass EXECUTED!! -i- runningpassmanager.py:_do_pass dag calculated! -i- runningpassmanager.py:_do_pass calling _update_vaid_passes -i- runningpassmanager.py:_do_pass updated! -i- runningpassmanager.py:run() end self._do_pass() -i- runningpassmanager.py:run() new passset -i- runningpassmanager.py:run() calling self._do_pass() -i- runningpassmanager.py:_do_pass calling _run_this_pass -i- runningpassmanager.py:_run_this_pass() executing transformation pass... -i- runningpassmanager.py:_run_this_pass() transformation pass EXECUTED!! -i- runningpassmanager.py:_do_pass dag calculated! -i- runningpassmanager.py:_do_pass calling _update_vaid_passes -i- runningpassmanager.py:_do_pass updated! -i- runningpassmanager.py:run() end self._do_pass() -i- runningpassmanager.py:run() new passset -i- runningpassmanager.py:run() calling self._do_pass() -i- runningpassmanager.py:_do_pass calling _run_this_pass -i- runningpassmanager.py:_run_this_pass() executing analysis pass... -i- runningpassmanager.py:_run_this_pass() generating fenced DAG circuit... -i- runningpassmanager.py:_run_this_pass() fenced DAG generated with 964559 nodes -i- runningpassmanager.py:_run_this_pass() running fenced DAG -i- depth.py:run() calculating dag depth... Segmentation fault (core dumped)

chriseclectic commented 3 years ago

Thanks for the additional information @roberCO. Since its happening during transpilation I'll transfer this issue to the terra repo so the relevant people can see it.

mtreinish commented 3 years ago

The recreate script posted doesn't work because I don't have the input json. I was able to recreate the segfault locally with:

from qiskit.circuit.random import random_circuit
from qiskit import Aer
from qiskit import execute
backend = Aer.get_backend('statevector_simulator')
circuit = random_circuit(10, 1000000, seed=42)
job = execute(circuit, backend, method='statevector')
state_vector = Statevector(experiment.result().get_statevector(qc))
print(state_vector)

It looks like the segfault is coming from retworkx because on my system the kernel log:

[168108.844751] python[714147]: segfault at 7ffcb163efec ip 00007f98f7c7b549 sp 00007ffcb163efe0 error 6 in retworkx.cpython-37m-x86_64-linux-gnu.so[7f98f7c20000+184000]
[168108.844760] Code: 49 89 fa 49 89 ed 49 c1 ed 05 48 8b 02 bb 01 00 00 00 d3 e3 42 8b 3c a8 89 fe 0f ab ce 0f a3 cf 42 89 34 a8 0f 82 50 01 00 00 <89> 5c 24 0c 48 89 54 24 28 4c 89 4c 24 30 4c 89 44 24 18 4c 89 c7
[168108.844796] audit: type=1701 audit(1607605780.794:195): auid=1000 uid=1000 gid=1000 ses=4 pid=714147 comm="python" exe="/usr/bin/python3.7" sig=11 res=1

My guess is it's a bug in the python binding lib that retworkx uses somewhere. When you were able to make it work on 0.19.6 what version of retworkx were you using?

mtreinish commented 3 years ago

I spent some time tracking this down. The segfault is originating in the upstream dependency of retworkx, petgraph. Specifically looking at the the back trace under gdb it's when retworkx calls is_cyclic_directed: https://github.com/Qiskit/retworkx/blob/0.7.1/src/lib.rs#L234 specifically inside the traversal code petgraph::visit::dfsvisit::dfs_visitor which is recursive and calling itself >70k times in my local backtrace. What it looks like to me is that there is a recursion bug (oddly not a stack overflow) in that function and for really large graphs it's causing it to segfault. There are some unsafe (meaning the normal memory protections the rust compiler provides are disabled) portions of code in the petgraph library that it uses for tracking which nodes it's visited during the graph traversal and I assume that is the underlying source of the bug.

Regardless, in the short term I think there are 2 ways to fix this, one in terra and the other in retworkx, and we may want to do both. For the terra side fix we should remove the calls to retworkx.is_directed_acyclic_graph() from the DAGCircuit class, they shouldn't be necessary because the only way a cycle can be introduced is if something/someone is hand editing the inner private retworkx graph object outside of the DAGCircuit class's api. Without that call the petgraph bug that retworkx/terra are triggering should be avoided. On the retworkx side we should change the retworkx.is_directed_acyclic_graph() function to not use petgraph::algo::is_cyclic_directed anymore. In the short term we can use toposort which isn't recursive, or just do our own dfs traversal. Either should work fine and avoid the bug.

mtreinish commented 3 years ago

@roberCO I've pushed up both fixes at https://github.com/Qiskit/qiskit-terra/pull/5505 and https://github.com/Qiskit/retworkx/pull/217 and confirmed both fix the segfault for me locally if you want to give either a try.

roberCO commented 3 years ago

Thank you very much @mtreinish !!

I was testing with different configurations and it solves the problem. I will do other tests that take like a couple of days but it seems it solved.

Qiskit / qiskit