Qiskit / qiskit-aer

Aer is a high performance simulator for quantum circuits that includes noise models
https://qiskit.github.io/qiskit-aer/
Apache License 2.0
480 stars 354 forks source link

OpenMP usage is null? Cannot parallelize jobs using qiskit-aer when using cpus #1843

Closed dup0nt closed 1 year ago

dup0nt commented 1 year ago

Informations

What is the current behavior?

Report from code, ran in a remote cluster, is showing an average low efficiency of CPU usage. For one of my experiments, when giving a total of 40 available threads, my job only uses 1, resulting in efficiency of 2.50%. The same happens for different number of threads. When analyzing the live CPU-efficiency chart, in the first seconds there is a peak of usage, showing more than one core processing data (for 40 threads, about 3-4 cores processing in parallel), but after that converges rapidly to one core. Since OpenMP is integrated in qiskit-aer, I was expecting some kind of parallelization.

Steps to reproduce the problem

The piece responsible for backend settings:

def choose_backend (simulator_variable):

    simulator = Aer.get_backend(simulator_variable)
    simulator.set_options(max_parallel_threads=40)

    return simulator 

I tried the combination of the options max_parallel_shots and max_parallel_experiments but none outputed the expected result.

To execute on the remote cluster, I am using the workflow management framework slurm and the srun command to execute my job in parallel.

What is the expected behavior?

I would expect higher CPU efficiency for job parallelization.

Suggested solutions

hhorii commented 1 year ago

Aer determines a number of threads based on a number of qubits of a circuit. If you want to simulate multiple circuits in parallel, configuration of max_parallel_experiments is necessary. Could you tell me what kind of simulation you are running?

dup0nt commented 1 year ago

Simulation is for a Quantum Walk. Your wording (more explicit that the one on the documentation) just prompted a problem I might have done. I am not creating all circuits and then executing then in only one prompt (.execute()). I am have a loop to (create circuit -> execute) x repeat. Will fix my code and then give some feedback. Thank you for that.

In the other way, OpenMP is not effective for single circuit executions?

hhorii commented 1 year ago

In the other way, OpenMP is not effective for single circuit executions?

Depending on a number of qubits. For 20 or more qubits, OpenMP is always effective, in my experiences. For 10 or less qubits, OpenMP degrades performance.

hhorii commented 1 year ago

Let me close this issue because OpenMP configuration works expectedly in @dup0nt's environment. Please use Slack Channel to discuss appropriate configuration of OpenMP for your environment.

dup0nt commented 1 year ago

After some testing, I am getting better CPU efficiency, but still under 30%. I am also submitting roughly 200 to 500 circuits in one execution. If this a correct practice for simulation? Or should I divide then in different executions to reduce memory allocation when the code is running? Is there any tool that does that? Thank you