Qiskit / qiskit-aer

Aer is a high performance simulator for quantum circuits that includes noise models
https://qiskit.github.io/qiskit-aer/
Apache License 2.0
470 stars 353 forks source link

Simulation with Threadpool or dask spend more time than with one CPUs #1629

Closed hexin95 closed 1 year ago

hexin95 commented 1 year ago

Informations

What is the current behavior?

We are using vqc to classify a physics problem with 12 quantum bits and a training data volume of 4000. the simulation time for one circuit is about 0.035 seconds, and the simulation time for 4000 circuits is about 140 seconds.

The simulation of circuits will consume expensive time during the training process. To speed up the simulation, we use ThreadPool and Dask to run these circuits in parallel. Unfortunately, in our service with 96 cpu cores, running with ThreaPool or Dask in parallel consumes more time than using one cpu core.

The use of Threadpool and Dask is referenced in https://qiskit.org/documentation/apidoc/parallel.html?highlight=parallel. In our experiments,backend = AerSimulator(method='density_ statevector) and we tried different parameters such as max_job_size, but despite this, it still took more time to use Threadpool and Dask than a single cpu. Why does a single cpu perform better, and how can we speed up the training of VQC?

Steps to reproduce the problem

the circuits number is 4000 and the number of qubits is 12.

time consuption without parallel

from qiskit.providers.aer import AerSimulator backend = AerSimulator() backend.set_options(shots=1024)

without dask with aer

import time start = time.process_time()

Run your circuits

result = backend.run(circuits).result() aer_time = time.process_time() - start

time consumption with Threadpool

import datetime from concurrent.futures import ThreadPoolExecutor exc = ThreadPoolExecutor(max_workers=20) backend.set_options(executor=exc) backend.set_options(max_job_size=20) start = datetime.datetime.now()

Run your circuits

result = backend.run(circuits).result() tp_time = datetime.datetime.now() - start print(aer_time ) print(tp_time.seconds)

print('the time without Threadpool is {} seconds'.format(aer_time)) print('the time with Threadpool is {} seconds'.format(tp_time.seconds)) the time without Threadpool is 147.602199 seconds the time with Threadpool is 139.6 seconds

What is the expected behavior?

the time of Thredpool should smaller than without parallel.

Suggested solutions

How about the efficient use of multi-core or GPU to greatly reduce the simulation time and thus increase the training speed of VQC?

hhorii commented 1 year ago

I think your environment is not distributed, then ThreadPoolExecutor is not efficient because OpenMP parallelization is much better. Please set max_parallel_experiments with a number of CPUs to parallelize your simulation.

hhorii commented 1 year ago

Please reopen this if the above my comment does not make sense to you.