atexit.unregister function speeds up computation sometimes

I have two (almost) identical simulations of a circuit in two different Ipython notebook. In one Ipython notebook the simulation is way faster than in the other. It is independent from the notebook because I know both cases happened in both notebooks. I appended a snippet of the logs of %prun at the end of the post. I checked in both cases the C++ compiler should be used (I assume this because of the usage of the _simulator.py:55(init) file in the log). I assume I did some initialization differently, but I do not know what. In the faster case the atexit.unregister function is used way longer, but I do not quite understand the effect of this function on the simulation.

I initialize the engine and circuit and do the measurement within the scipy.optimize.minimize function. So the broad code structure is

scipy.optimize.minimize(experiments, init_parameter, ...)
def experiments(parameter):
    for in range(100):
        eng = projectq.MainEngine(backend=projectq.backends.Simulator(gate_fusion=True), engine_list=[])
        q = eng.allocate_qubit()
        circuit(parameter, eng, q)
        projectq.ops.Measure | q
        eng.flush()
        do some non-projectq related postprocessing

But I also tried to put all code in one function and initialize the engine at different spots, but it seems to not change the result.

Here a snapshot of the log of the fast computation

         1066623 function calls (1066621 primitive calls) in 3.150 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      766    1.272    0.002    1.272    0.002 {built-in method atexit.unregister}
    13000    0.277    0.000    1.098    0.000 _simulator.py:350(_handle)
    14600    0.180    0.000    1.295    0.000 _simulator.py:422(receive)
    23421    0.135    0.000    0.135    0.000 {built-in method numpy.core.multiarray.array}
    23400    0.117    0.000    0.221    0.000 {built-in method __new__ of type object at 0x9e5d60}
     7200    0.115    0.000    0.130    0.000 {built-in method numpy.core.multiarray.dot}
    23401    0.090    0.000    0.090    0.000 {built-in method _warnings.warn}
    23400    0.085    0.000    0.544    0.000 defmatrix.py:112(__new__)
    14600    0.064    0.000    1.419    0.000 _command.py:86(__init__)
   217841    0.053    0.000    0.053    0.000 {built-in method builtins.isinstance}
    13600    0.045    0.000    0.057    0.000 _basics.py:123(make_tuple_of_qureg)
    37800    0.039    0.000    0.120    0.000 defmatrix.py:164(__array_finalize__)
    16800    0.037    0.000    0.038    0.000 {built-in method builtins.sorted}
     9800    0.034    0.000    0.243    0.000 _basics.py:166(generate_command)
      800    0.033    0.000    1.607    0.002 <ipython-input-7-1a377ac0e731>:1(h2_bk_circuit)
     7200    0.030    0.000    0.320    0.000 _gates.py:55(matrix)
     2200    0.026    0.000    0.453    0.000 _metagates.py:190(__or__)
    14600    0.024    0.000    0.059    0.000 _command.py:214(control_qubits)
      800    0.024    0.000    0.030    0.000 _simulator.py:55(__init__)
    14600    0.022    0.000    0.026    0.000 _command.py:173(_order_qubits)
    14600    0.021    0.000    0.026    0.000 _command.py:263(engine)
    14600    0.017    0.000    0.100    0.000 _command.py:109(<listcomp>)
    29200    0.017    0.000    0.772    0.000 _command.py:109(<genexpr>)
     4800    0.016    0.000    0.121    0.000 _gates.py:211(matrix)
    14600    0.014    0.000    0.040    0.000 _command.py:123(qubits)
     9000    0.013    0.000    0.258    0.000 _gates.py:68(matrix)
      800    0.013    0.000    3.056    0.004 <ipython-input-8-990688fbed52>:2(run_h2_bk_circuit)
    37600    0.013    0.000    0.021    0.000 _basics.py:202(__eq__)
     8200    0.012    0.000    1.341    0.000 _basics.py:184(__or__)
     9800    0.012    0.000    1.168    0.000 _command.py:47(apply_command)
     1600    0.012    0.000    0.068    0.000 _basics.py:134(deallocate_qubit)
    14600    0.011    0.000    1.354    0.000 _main.py:268(send)
      800    0.010    0.000    0.015    0.000 _main.py:57(__init__)
    21600    0.010    0.000    0.010    0.000 _qubit.py:44(__init__)

and here from the slow computation

         1048203 function calls (1048201 primitive calls) in 44.361 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    13000   41.300    0.003   43.490    0.003 _simulator.py:350(_handle)
     7200    1.401    0.000    1.420    0.000 {built-in method numpy.core.multiarray.dot}
    23421    0.164    0.000    0.164    0.000 {built-in method numpy.core.multiarray.array}
    23400    0.147    0.000    0.188    0.000 {built-in method __new__ of type object at 0x9e5d60}
    23401    0.115    0.000    0.115    0.000 {built-in method _warnings.warn}
    23400    0.103    0.000    0.589    0.000 defmatrix.py:112(__new__)
   217041    0.075    0.000    0.075    0.000 {built-in method builtins.isinstance}
    13800    0.074    0.000    0.288    0.000 _command.py:86(__init__)
    13800    0.065    0.000   43.576    0.003 _simulator.py:422(receive)
    13600    0.057    0.000    0.074    0.000 _basics.py:123(make_tuple_of_qureg)
    16000    0.046    0.000    0.047    0.000 {built-in method builtins.sorted}
    37800    0.045    0.000    0.060    0.000 defmatrix.py:164(__array_finalize__)
     9800    0.041    0.000    0.303    0.000 _basics.py:166(generate_command)
      800    0.040    0.000   27.540    0.034 <ipython-input-4-a0c9527638d5>:4(h2_bk_circuit)
     7200    0.036    0.000    1.648    0.000 _gates.py:55(matrix)
     2200    0.032    0.000    7.462    0.003 _metagates.py:190(__or__)
    13800    0.027    0.000    0.071    0.000 _command.py:214(control_qubits)
    13800    0.026    0.000    0.033    0.000 _command.py:173(_order_qubits)
    13800    0.025    0.000    0.031    0.000 _command.py:263(engine)
      779    0.020    0.000    0.020    0.000 {built-in method atexit.unregister}
     4800    0.019    0.000    0.145    0.000 _gates.py:211(matrix)
    13800    0.019    0.000    0.027    0.000 _command.py:109(<listcomp>)
     1600    0.018    0.000    3.011    0.002 _basics.py:85(allocate_qubit)
      800    0.016    0.000    0.024    0.000 _simulator.py:55(__init__)
    27600    0.016    0.000    0.056    0.000 _command.py:109(<genexpr>)
     9000    0.016    0.000    0.244    0.000 _gates.py:68(matrix)
     9800    0.016    0.000   34.245    0.003 _command.py:47(apply_command)
    13800    0.015    0.000    0.049    0.000 _command.py:123(qubits)
    36800    0.015    0.000    0.028    0.000 _basics.py:202(__eq__)
     8200    0.015    0.000   27.376    0.003 _basics.py:184(__or__)
      800    0.015    0.000   37.818    0.047 <ipython-input-5-0573d3543f43>:5(run_h2_bk_circuit)
    13800    0.014    0.000   43.620    0.003 _main.py:268(send)
     1600    0.013    0.000    6.508    0.004 _basics.py:134(deallocate_qubit)
    52633    0.012    0.000    0.012    0.000 {built-in method builtins.len}
    20000    0.012    0.000    0.012    0.000 _qubit.py:44(__init__)
      800    0.011    0.000    0.016    0.000 _main.py:57(__init__)
     2400    0.011    0.000    0.023    0.000 _basics.py:243(__init__)
     9800    0.010    0.000    0.017    0.000 {built-in method builtins.all}
     2400    0.010    0.000    0.010    0.000 {built-in method builtins.round}
     8200    0.010    0.000    0.024    0.000 defmatrix.py:261(tolist)
    17000    0.008    0.000    0.008    0.000 _basics.py:65(__init__)
     2400    0.008    0.000    0.073    0.000 _gates.py:231(matrix)
    10600    0.008    0.000   34.235    0.003 _main.py:258(receive)
     9800    0.008    0.000    0.008    0.000 _basics.py:179(<listcomp>)
     1600    0.007    0.000    6.534    0.004 _qubit.py:121(__del__)

ProjectQ-Framework / ProjectQ

atexit.unregister function speeds up computation sometimes #258