IntelLabs / HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
BSD 2-Clause "Simplified" License
120 stars 16 forks source link

Question: MPI use in future versions #11

Open abieler opened 8 years ago

abieler commented 8 years ago

Just curious: Is it planned to move towards the Julia built in parallelism functionality in the future or to keep going the MPI/C++ route?

If so, why/why not? :)

Thanks, Andre

ehsantn commented 8 years ago

If you are referring to remotecall() and spawn mechanism, they are not particularly useful for our target domains (data analytics, scientific computing etc). We especially need collective message passing calls like allreduce() for most use cases. We also need parallel I/O.

We could avoid generating C++ and use MPI.jl or use direct ccall mechanism for MPI. However, using Julia on large-scale supercomputers has infrastructure challenges (e.g. thousands of nodes loading packages). Creating a standalone light-weight binary from C++ code has been very useful.

Do you have a specific use case in mind?

abieler commented 8 years ago

Thanks for the explanation. I was mainly just wondering because Julia emphasizes its built in parallel capabilities. So why not use it. (I might have a data analysis problem maybe worth of parallelization. A few 10s of GB of data that needs some peak fitting, peak finding, smoothing etc which now runs as serial plain julia code. It is still feasible to run it "over night" at the moment though.. :) )

ehsantn commented 8 years ago

You'd probably need some new features in HPAT. We will be happy to help though.

abieler commented 8 years ago

I would be happy to make this a test case. I dont have access to anything with more than 2 cores at the moment though.. :)