Specific MPI4py Example

jacobseiler commented 7 years ago

I know that an example on using MPI4py is on the agenda so I thought I'd upload some code that I would really like to parellize.

code_review_Jacob.zip

I have a bunch of galaxies in a number of files (I've uploaded 2 files but my actual data set contains 125 files). Each data file is of the form:

Number of Trees within File (Ntrees) Number of Galaxies within File (NtotGals) Number of Galaxies within Tree 0 Number of Galaxies within Tree 1 ..... Number of Galaxies within Tree Ntrees-1 Galaxy 0 Galaxy 1 .... Galaxy NtotGals-1

My script (which is heavily based on Darren's original script written for SAGE) goes through each file and first reads the number of trees and the number of galaxies within the file. It then creates an empty array with the correct data structure (defined by how the galaxy data is structured) large enough to hold all the galaxies. It then goes through all the files again and fills the array with all the galaxy data.

The first pass through the files is very quick as the script only needs to read 2 integers. However for the subsequent pass where it actually reads the data slows down quite a bit as each file contains ~10,000 galaxies.

So basically I know that I should generate a list of files that each processor needs to read then distribute them, let each processor read a galaxy file and then pass back the resulting array to the master node which then slices it into the correct position within the master array. Just looking for some tips on how exactly to do this!

Hopefully you're all still awake after my wall of text.

Jacob

manodeep commented 7 years ago

Have you tried to parallelize this script already? What errors/issues are you running into?

jacobseiler commented 7 years ago

So as we discussed the trick was the properly loop over the files. for fnr in xrange(First_File + rank, Last_File+1, size): and then have each processor do calculations itself before only passing back the RESULTS to the main task to be plotted.

One thing was when I did the binning for the Stellar Mass Function was to ensure that all the bins were equal for each processor, otherwise things would get messy. To do this each processor calculates it's local minimum and maximum value of the data and then the global binning minimum and maximum is found using binning_minimum = comm.allreduce(minimum_mass, op = MPI.MIN) which when given a binwidth (same for all processors) then ensures the bins are all aligned.

Each processor can then calculate it's own histogram and then the results passed back to the root processor with comm.Reduce([counts_local, MPI.DOUBLE], [counts_total, MPI.DOUBLE], op = MPI.SUM, root = 0).

caseresearch / code-review

Specific MPI4py Example #4