N3PDF / vegasflow

VegasFlow: accelerating Monte Carlo simulation across multiple hardware platforms
https://vegasflow.readthedocs.io
Apache License 2.0
34 stars 9 forks source link

Multi-GPU parallelism #12

Closed scarrazza closed 4 years ago

scarrazza commented 4 years ago

After a more careful reading of https://www.tensorflow.org/guide/gpu, and testing some operators like matmul with large matrices, I realized that TF doesn't use all available GPUs automatically, i.e.:

Thus for this project we have to consider splitting manually the n_iter or n_calls across the available tf.devices so we may get a factor nGPUs faster. We may also consider adding the CPU0 together with GPUs.

So need to:

  1. confirm my observation (I may be forgetting something trivial)
  2. implement an algorithm for job balancing, which devides the workload of each device.
  3. apply the job distribution inside vegas
scarrazza commented 4 years ago

I can confirm that 1 is true, by adding to the lepage example the call

tf.debugging.set_log_device_placement(True)

and observing that the log never places an operator on GPU:1 but always on GPU:0 (even if nvidia-smi says that the program is using memory from GPU:1).

Concerning points 2 and 3, I think the best tf-like approach is to do something like this:

@tf.function
def run():
    strategy = tf.distribute.MirroredStrategy()
    with strategy.scope():
        strategy.experimental_run_v2(vegas(lepage, dim, n_iter, ncalls))

i.e. using the tf.distribute API.

scarlehoff commented 4 years ago

A possibility, using MirroredStrategy https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy?version=stable, just breaks the integration into equal chunks. Not very useful for distributing. If we want to do it correctly we need to implement something not very far from one of the ones here http://jakascorner.com/blog/2016/06/omp-for-scheduling.html#the-scheduling-types Which means creating our own strategy.

scarrazza commented 4 years ago

There some projects like https://github.com/horovod/horovod which may help.

scarlehoff commented 4 years ago

Let's have a look. I've been reading more into the Tensorflow distribution strategies and it seems only the keras distribution is implemented and in order to use we have to tie our hands way too much imho.

I think it is better if we deal with it in our own terms for now (and actually don't take it into consideration for the rest of the code) because we can always go to the parallel/joblib strategy.

scarrazza commented 4 years ago

In view of the great shape of #17 , I think we should consider the possibility to inherit from VegasFlow some extra classes which implement specific distribution techniques such as:

  1. VegasFlow, default single GPU.
  2. TPEVegasFlow, using the ThreadPoolExecutor from concurrent.future or whatever.
  3. SparkVegasFlow, using Apache Spark.
  4. MPIVegasFlow using openMPI.