Support distributed environment for shot-level parallelization

hhorii commented 5 years ago

What is the expected behavior?

Simulation with noise is an embarrassingly parallel workload if one-shot simulation needs time. Multiple servers can reduce simulation time.

We think two cases of distributed environment.

Personal workstations: qiskit application on a laptop runs simulation on personal workstations
HPC cluster or kubernetes cluster: qiskit application on a laptop submits job to a scheduler and receives its result

In this issue, we focus on the first case. I list discussion points in following.

API

We need to keep simplicity in APIs to use distributed environment. Following code is an example to use server1 and server2 (urls):

from qiskit import execute, Aer 
bkend = Aer.get_backend("qasm_simulator", hosts=['server1', 'server2'])

Installation

We also need to keep simplicity in installation. pip qiskit-aer should install the all.

RPC

We are thinking two ways to start simulation:

REST
shell (ssh, mpirun, and etc)

In 1, a daemon process handles a http request and return results by calling local standalone-simulator. This approach is similar to a way to call simulators in IBM QX Experience. In 2, a client (qiskit application) calls shell commands to start simulation on remote (or local) servers.

atilag commented 5 years ago

... Multiple servers can reduce simulation time.

So for the first case: Personal workstations. How can multiple servers can be faster than spwaning tasks in parallel using the CPU cores? ... I may have not understood the problem correctly. We already have parallel shot-level simulation in palce right?

hitomitak commented 5 years ago

Current parallel shot-level simulation (OpenMP parallelization) is effective for only the local node. We do not mention OpenMP parallelization but the distributed environment with multiple nodes. For example, when the simulation is 10 shots with noise and workers are 2 nodes, we can send 5 shots' simulations to 2 workers and collect the results from them such as map/reduce function. The execution time becomes half compared with only one node.

I measured the execution time in a distributed environment. The result is as follows: QV, 20 qubits, 1024 shots with device noise

MBP : 1416 Sec
P9 x 1 node : 314.3 Sec
P9 x 2 nodes: 153 Sec

atilag commented 5 years ago

So the workers are executed in different computers, right?

hitomitak commented 5 years ago

Yes. That's right.

atilag commented 5 years ago

So I guess that servers need to be launched manually on each of the nodes, before the master (or client) distribute the jobs among them. And this server code will just execute the standalone simulator, or will run the Terra addon.

atilag commented 5 years ago

Ok, I just saw it on your PR.

atilag commented 4 years ago

MPI simulator will handle this functionality, and it's already in progress, so I'm closing this issue.

Qiskit / qiskit-aer

Support distributed environment for shot-level parallelization #220

What is the expected behavior?

API

Installation

RPC