UCL-RITS / rse-classwork-2020

4 stars 113 forks source link

Improving performance using MPI #194

Open ageorgou opened 3 years ago

ageorgou commented 3 years ago

Approximating π using parallelisation

Introduction

This exercise builds on #185. It is part of a series that looks at execution time of different ways to calculate π using the same Monte Carlo approach. In this approach, π is approximated by sampling n random points inside a square with side 1, computing the proportion of those points that fall inside the unit circle, and multiplying that by 4/n.

image

This exercise uses the Message Passing Interface (MPI) to accomplish this approximation of π. The code is already written, and you can find it in calc_pi_mpi.py on the week10 branch of this repository. Your job is to install MPI, and measure how much time it takes to complete in comparison to #185.

MPI

MPI allows parallelisation of computation. An MPI program consists of multiple processes, existing within a group called a communicator. The default communicator contains all available processes and is called MPI_COMM_WORLD.

Each process has its own rank and can execute different code. A typical way of using MPI is to divide the computation into smaller chunks, have each process deal with a chunk, and have one "main" process to coordinate this and gather all the results. The processes can communicate with each other in pre-determined ways as specified by the MPI protocol -- for example, sending and receiving data to a particular process, or broadcasting a message to all processes.

Preparation

We are going to run the original (non-numpy) version in parallel, and compare it to the non-parallel version.

We will be using mpi4py, a Python library that gives us access to MPI functionality.

Install mpi4py using conda:

conda install mpi4py -c conda-forge

or pip:

pip install mpi4py

On windows you will also need to install MS MPI

The MPI version of the code is available at calc_pi_mpi.py. Look at the file and try to identify what it is doing -- it's fine if you don't understand all the details! Can you see how the concepts in the brief description of MPI above are reflected in the code?

Execution

  1. Run the MPI version as:
    mpiexec -n 4 python calc_pi_mpi.py

    The -n argument controls how many processes you start.

  2. Increase the number of points and proceses, and compare the time it takes against the normal version. Note that to pass arguments to the python file (like -np below), we have to give those after the file name.
    mpiexec -n 4 python calc_pi_mpi.py -np 10_000_000
    python calc_pi.py -np 10_000_000 -n 1 -r 1

    Tip: To avoid waiting for a long time, reduce the number of repetitions and iterations of timeit (1 and 1 in this example)

  3. Think of these questions:
    • Is the MPI-based implementation faster than the basic one?
    • Is it faster than the numpy-based implementation?
    • When (for what programs or what settings) might it be faster/slower?
    • How different is this version to the original? How easy is it to adapt to using MPI?