Closed yitchen-tim closed 1 year ago
Hi @yitchen-tim thanks for the info on this. If you'd be happy to share some nvprof data, or even a cProfile output from this we'd be happy to take a look. We are aware that setup and initialization of the CUDA device (at least for the first instantiation) can be heavy, but we would need to confirm first that the issue is on our side and not the CUDA driver and context setup as part of cuQuantum's initialization.
Setting diff_method=None
does not change the run time. It seems that hooking up to ML library may not be the bottleneck.
Trying the deeper, below are the results (both simulators uses cuQuantum). There are about 9 seconds of fixed overhead. The scaling for both simulators is the same when increasing the circuit depth. N=28, L=1
N=28, L=10
N=28, L=100
Note: N is qubit count, L is the number of GHZ layers.
Hi @yitchen-tim with the merge of https://github.com/PennyLaneAI/pennylane-lightning-gpu/pull/70 this issue should now be resolved. Feel free to try it out the current master, or wait for 3 weeks and try out release v0.28.0.
Closing for now as this is resolved, but feel free to reopen if you see this is not the case.
Issue description
There is a subroutine called
apply_cq
in lightning_gpu that evolves states based on the quantum circuit. Isapply_cq
the only place that evolves the quantum state? For example, if I profile the run time of this routine, is it a good representation of how long it takes for GPU computing?In my profiling, a 28-qubit GHZ circuit that took about 10 seconds, only 0.01% of the run time was spent on
apply_cq
. Initializing, data moving and post-processing took the majority of the run time. Is there a roadmap that these overheads can be reduced? (for example, maybe it can move some post-processing to GPU so that we can avoid moving big chunk of data around?).Expected behavior: (What you expect to happen) Most run time is spent on simulation
Actual behavior: (What actually happens) Most run time is spent on initialization, data moving and post-processing
Reproduces how often: (What percentage of the time does it reproduce?) Everytime
System information: (post the output of
import pennylane as qml; qml.about()
)Source code and tracebacks