RenderKit / ospray

An Open, Scalable, Portable, Ray Tracing Based Rendering Engine for High-Fidelity Visualization
http://ospray.org
Apache License 2.0
982 stars 178 forks source link

MPI execution : How to ? #518

Closed nyue closed 2 years ago

nyue commented 2 years ago

Hi,

Is there some documentation of how I should supply command line option to run ospray_studio in MPI mode ?

I wish to run some benchmark on MPI rendering.

Cheers

BruceCherniak commented 2 years ago

OSPRay Studio passes command line arguments directly on to OSPRay for this. Take a look at https://www.ospray.org/documentation.html#mpi-offload-rendering.

The syntax will vary a little depending on your setup and the network fabric you connect nodes with.

nyue commented 2 years ago

Right. I will give that a go. Thank you.

nyue commented 2 years ago

Most OS comes with either MPICH or OpenMPI implementation of MPI

Which would be closer to the syntax in the documentation ?

Which MPI implementation was the OSPray Studio binaries built with ?

Cheers

BruceCherniak commented 2 years ago

I'm not as familiar with the OSPRay pre-built binaries. Please look here https://www.ospray.org/downloads.html. However, you may need to build your own, depending on your config.

As I recall, OSPRay is compatible with most MPI implementations (OpenMPI, MPICH, Intel® MPI, MVAPICH) but needs to be compiled against the target version. I'm not the expert here, but will try to get you a better answer.

johguenther commented 2 years ago

Most MPI implementations are binary compatible, thus OSPRay binaries work with Intel MPI, MPICH or OpenMPI, no re-compilation needed.

Re commandline: those should also very similar, the only potential difference I'm aware of is mpiexec vs. mpirun (and some special options are unique to the MPI implementation, but common arguments like -n should work identically). Did you run into issues?

Twinklebear commented 2 years ago

I think OpenMPI is the one that sticks out as not binary compatible, we build the OSPRay MPI releases against MPICH which means they'll be binary compatible with Intel MPI, MPICH, MVAPICH, and many others (any that are MPICH ABI compatible).

To run Studio with the MPI offload device I think this should work:

mpirun -n 3 ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload

Or you can use the separate app/worker launch mode:

mpirun -n 1 ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload : -n 2 <path to ospray binaries>/ospray_mpi_worker
nyue commented 2 years ago

Ah! Noted regarding OpenMPI

With the second option

mpirun -n 1 ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload : -n 2 <path to ospray binaries>/ospray_mpi_worker

I am going to try this out on an AWS parallel cluster I spin up

Will I be able to specify hosts name too ?

e.g.

mpirun -n 1 --hosts headnode ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload : -n 2 --hosts compute0, compute1 <path to ospray binaries>/ospray_mpi_worker
Twinklebear commented 2 years ago

Yeah with both single launch and split launch you can pass a list of host names. The host names are a global option, so you'd pass just one list of host names and the processes pick the names in order. So for example it'd be like:

mpirun -hosts headnode,compute0,compute1 \
    -n 1 ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload : \
    -n 2 <path to ospray binaries>/ospray_mpi_worker

Then ospStudio will run on the headnode, and ospray_mpi_worker will run on the compute0 and compute1 nodes. You may also want to explicitly pass the global option -ppn 1 to specify that only one process should be launched per node:

mpirun -hosts headnode,compute0,compute1 -ppn 1 \
    -n 1 ./ospStudio <args> --osp:load-modules=mpi --osp:device=mpiOffload : \
    -n 2 <path to ospray binaries>/ospray_mpi_worker

We do also support OpenMPI, but you'd have to build from source (easiest route there is via the superbuild)

nyue commented 2 years ago

With OSPray, it worked

/opt/intel/mpi/2021.4.0/bin/mpirun -hosts ip-172-31-25-178,queue0-dy-queue0-t2medium-1,queue0-dy-queue0-t2medium-2,queue0-dy-queue0-t2medium-3,queue0-dy-queue0-t2medium-4 -ppn 1 -n 1 /shared/ospray-2.9.0.x86_64.linux/bin/ospExamples --osp:load-modules=mpi --osp:device=mpiOffload : -n 4 /shared/ospray-2.9.0.x86_64.linux/bin/ospray_mpi_worker

With OSPray Studio, I get "OSPray Studio not responding" pop up dialog from the Ubuntu 20.04 desktop

/opt/intel/mpi/2021.4.0/bin/mpirun -hosts ip-172-31-25-178,queue0-dy-queue0-t2medium-1,queue0-dy-queue0-t2medium-2,queue0-dy-queue0-t2medium-3,queue0-dy-queue0-t2medium-4 -ppn 1 -n 1 /shared/ospray_studio-0.10.0-Linux/bin/ospStudio --osp:load-modules=mpi --osp:device=mpiOffload : -n 4 /shared/ospray-2.9.0.x86_64.linux/bin/ospray_mpi_worker
BruceCherniak commented 2 years ago

Thanks. I'll take a look.

BruceCherniak commented 2 years ago

We found the issue here. Resolution will be a commit on the OSPRay side. Stay tuned.

Twinklebear commented 2 years ago

Hi @nyue , could you try out the latest devel branch of OSPRay with Studio? The fix for this issue is in 3d425603c7b36ad92107d8ff3eaff7760227007b , and I'm able to run OSPRay Studio with MPI offload now.

nyue commented 2 years ago

I don't have access to my AWS cluster (which provides Intel MPI) at the moment so I cannot test/validate with the environment with which I submitted the original report, however I have a local workstation (Ubuntu 18.04 which I did build the devel branch of ospray), here is my result.

Note that I am using OpenMPI which may adds some wrinkle

nyue@head0:~$ mpirun -host head0,head0 -n 1 /home/nyue/systems/ospray_studio/head/bin/ospStudio  --osp:load-modules=mpi --osp:device=mpiOffload : -n 1 /home/nyue/systems/ospray/devel/bin/ospray_mpi_worker
[head0:26102] *** Process received signal ***
[head0:26102] Signal: Segmentation fault (11)
[head0:26102] Signal code: Address not mapped (1)
[head0:26102] Failing at address: 0x3
[head0:26102] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef10)[0x7f21f1beef10]
[head0:26102] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0xb1306)[0x7f21f1c61306]
[head0:26102] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_argv_join+0x39)[0x7f21ed96b9f9]
[head0:26102] [ 3] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_mpi_init+0x69b)[0x7f21ede9e60b]
[head0:26102] [ 4] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Init_thread+0x45)[0x7f21edebf405]
[head0:26102] [ 5] /home/nyue/systems/rkcommon/1.9.0/lib/libospray_module_mpi.so(_ZN9mpicommon4initEPiPPKcb+0xc7)[0x7f21f0b00c37]
[head0:26102] [ 6] /home/nyue/systems/rkcommon/1.9.0/lib/libospray_module_mpi.so(_ZN6ospray3mpi28createMPI_RanksBecomeWorkersEPiPPKcPNS0_16MPIOffloadDeviceE+0x32)[0x7f21f0aa13f2]
[head0:26102] [ 7] /home/nyue/systems/rkcommon/1.9.0/lib/libospray_module_mpi.so(_ZN6ospray3mpi16MPIOffloadDevice16initializeDeviceEv+0x1a8)[0x7f21f0aa7558]
[head0:26102] [ 8] /home/nyue/systems/ospray/devel/bin/../lib/libospray.so.2(ospInit+0xc1a)[0x7f21f2562a9a]
[head0:26102] [ 9] /home/nyue/systems/ospray/devel/bin/ospray_mpi_worker(main+0xe6)[0x558a49e5dcf6]
[head0:26102] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f21f1bd1c87]
[head0:26102] [11] /home/nyue/systems/ospray/devel/bin/ospray_mpi_worker(_start+0x2a)[0x558a49e5ddaa]
[head0:26102] *** End of error message ***
OSPRay Studio
[warn] Epoll ADD(4) on fd 33 failed.  Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node head0 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I am happy to redo the tests once I have my AWS cluster access back.

Cheers

BruceCherniak commented 2 years ago

Does ospExamples run on this local workstation? Is this failure unique to ospStudio? Might help rule out OpenMPI as a "wrinkle".

Twinklebear commented 2 years ago

That's interesting to see it's crashing w/ OpenMPI just when trying to do MPI_Init_thread, in testing this locally I'm also able to reproduce this. It seems to only crash on init when using OpenMPI + the separate app/worker split launch command. I can reproduce the crash with ospExamples with a split launch using OpenMPI:

mpirun -n 1 ./ospExamples --osp:load-modules=mpi --osp:device=mpiOffload  : -n 1 ./ospray_mpi_worker

But not when using a single launch:

mpirun -n 2 ./ospExamples --osp:load-modules=mpi --osp:device=mpiOffload  

Or when using mpich. I'll have to dig into this further as a separate issue

nyue commented 2 years ago

@BruceCherniak I repeated it with ospExamples and it crash so it is more about OpenMPI or my Ubuntu 18.04 installation of OpenMPI.

Twinklebear commented 2 years ago

Hi @nyue , I think 587de6cf84dec4f51a594e4daf2ea751d9f5b03e should fix the OpenMPI issue, so you should also be able to run locally with OpenMPI now, I'm now able to run studio/ospExamples w/ OpenMPI using the split launch.

nyue commented 2 years ago

I have successfully built and ran the devel branch of ospray on Ubuntu and using the OpenMPI libraries from that distribution.

I have tried both invocation, single launch and app/work split

Thank you.

BruceCherniak commented 2 years ago

Does ospStudio work then also?

Twinklebear commented 2 years ago

Awesome! I've found that on Ubuntu locally I typically get best perf with MPICH, which should also be in apt, but it's good to know about issues that come up with OpenMPI since it is a supported target.

nyue commented 2 years ago

@BruceCherniak ospStudio ran fine with similar call, loaded tutorial_scene as a test

$ mpirun --version
mpirun (Open MPI) 4.0.3
$ mpirun -n 1 /home/nyue/systems/ospray_studio/bruce/bin/ospStudio --osp:load-modules=mpi --osp:device=mpiOffload  : -n 1 /home/nyue/systems/ospray/devel/bin/ospray_mpi_worker
OSPRay Studio
OpenImageDenoise is available
GUI mode
$ mpirun -n 2 /home/nyue/systems/ospray_studio/bruce/bin/ospStudio --osp:load-modules=mpi --osp:device=mpiOffload
OSPRay Studio
OSPRay Studio
OpenImageDenoise is available
GUI mode