Closed hexin95 closed 1 year ago
I think your environment is not distributed, then ThreadPoolExecutor
is not efficient because OpenMP parallelization is much better. Please set max_parallel_experiments
with a number of CPUs to parallelize your simulation.
Please reopen this if the above my comment does not make sense to you.
Informations
What is the current behavior?
We are using vqc to classify a physics problem with 12 quantum bits and a training data volume of 4000. the simulation time for one circuit is about 0.035 seconds, and the simulation time for 4000 circuits is about 140 seconds.
The simulation of circuits will consume expensive time during the training process. To speed up the simulation, we use ThreadPool and Dask to run these circuits in parallel. Unfortunately, in our service with 96 cpu cores, running with ThreaPool or Dask in parallel consumes more time than using one cpu core.
The use of Threadpool and Dask is referenced in https://qiskit.org/documentation/apidoc/parallel.html?highlight=parallel. In our experiments,backend = AerSimulator(method='density_ statevector) and we tried different parameters such as max_job_size, but despite this, it still took more time to use Threadpool and Dask than a single cpu. Why does a single cpu perform better, and how can we speed up the training of VQC?
Steps to reproduce the problem
the circuits number is 4000 and the number of qubits is 12.
time consuption without parallel
from qiskit.providers.aer import AerSimulator backend = AerSimulator() backend.set_options(shots=1024)
without dask with aer
import time start = time.process_time()
Run your circuits
result = backend.run(circuits).result() aer_time = time.process_time() - start
time consumption with Threadpool
import datetime from concurrent.futures import ThreadPoolExecutor exc = ThreadPoolExecutor(max_workers=20) backend.set_options(executor=exc) backend.set_options(max_job_size=20) start = datetime.datetime.now()
Run your circuits
result = backend.run(circuits).result() tp_time = datetime.datetime.now() - start print(aer_time ) print(tp_time.seconds)
print('the time without Threadpool is {} seconds'.format(aer_time)) print('the time with Threadpool is {} seconds'.format(tp_time.seconds)) the time without Threadpool is 147.602199 seconds the time with Threadpool is 139.6 seconds
What is the expected behavior?
the time of Thredpool should smaller than without parallel.
Suggested solutions
How about the efficient use of multi-core or GPU to greatly reduce the simulation time and thus increase the training speed of VQC?