IntelLabs / HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
BSD 2-Clause "Simplified" License
120 stars 16 forks source link

Distributed Execution #19

Open rafaelcarv opened 7 years ago

rafaelcarv commented 7 years ago

Hello,

How do i execute the HPAT distributed? I can just execute the framework in parallel, but i don't figured how I execute it in a distributed environment.

There's any file that I have to edit? If there's a file, which one? I was just reading the code, and could not find which file i have to edit to put the ip addresses for the workers.

Thanks

Wajihulhassan commented 7 years ago

You mean you can only run in parallel on a single machine but you cannot run in the cluster environment (on multiple machines). If that is the case, you can use mpirun command with -hosts to specify IP addresses of the worker machines. Make sure your final binary that you want to run is also present on worker machines.

http://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ http://stackoverflow.com/questions/15903408/specify-the-machines-running-program-using-mpi

rafaelcarv commented 7 years ago

That's it, thanks!

I can execute in parallel, in just one machine with no problem. However, when i'm executing in with 2 distributed machines, i have this error:

Distributed-memory MPI mode. OpenMP is not used. Main will be generated in file main1.cc Data for main will be in file main1.data Script to compile is in main1.sh ERROR: LoadError: error compiling ##_ppcalcPip271_j2c_proxy#289: could not load library "/home/mpiuser/.julia/v0.5/ParallelAccelerator/src/../deps/generated/libcgen_output0.so.1.0" /home/mpiuser/.julia/v0.5/ParallelAccelerator/src/../deps/generated/libcgen_output0.so.1.0.so: cannot open shared object file: No such file or directory in calcPi(::Int64) at /home/mpiuser/.julia/v0.5/CompilerTools/src/OptFramework.jl:598 in main() at /home/mpiuser/.julia/v0.5/HPAT/examples/pi.jl:56 in include_from_node1(::String) at ./loading.jl:488 in process_options(::Base.JLOptions) at ./client.jl:262 in _start() at ./client.jl:318 while loading /home/mpiuser/.julia/v0.5/HPAT/examples/pi.jl, in expression starting on line 61

I followed the instructions in the wiki (for ubuntu 16.04) and in the readme. Instead of a export, i created a persistent environment variable ( for this line export CPLUS_INCLUDE_PATH=/usr/include/hdf5/openmpi/)

ehsantn commented 7 years ago

Seems like this isn't running on a shared file system in your cluster. Please check your cluster setup.

Alternatively, you can compile the standalone file (main1.sh command) and copy the binary on the same path on local file systems of different nodes. Then you can just use the same binary path for mpirun.