Closed nyue closed 2 years ago
Yes, via mpirun
, for details see https://www.ospray.org/documentation.html#mpi-offload-rendering (and maybe also https://www.ospray.org/tutorials.html#mpi-distributed-tutorials).
I am still encountering some problem so narrowing down the problem step by step.
I am using the ospMPIDistribTutorialVolume as an example to test
The ospMPIDistribTutorialVolume works on the head node pc0 by itself (no mpirun)
pc0$ mpirun --host pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 -x LD_LIBRARY_PATH /piconfs/systems/OSPray/head/bin/ospMPIDistribTutorialVolume --osp:load-modules=mpi --osp:device=mpiOffload
I get the following errors and want to confirm if the volume example is expected to work ?
Do I need to enable remote GL/EGL display ?
OSPRay rank 1/4
OSPRay rank 0/4
OSPRay rank 3/4
OSPRay rank 2/4
terminate called after throwing an instance of 'std::runtime_error'
what(): Failed to initialize GLFW!
[pc1:09272] *** Process received signal ***
[pc1:09272] Signal: Aborted (6)
[pc1:09272] Signal code: (-6)
[pc1:09272] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 9272 on node pc1 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
Hi @nyue , I think the issue here is that you're running rank 0 (which will try to open the window) on pc1, not pc0. Could you try:
mpirun --host pc0,pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 -x LD_LIBRARY_PATH /piconfs/systems/OSPray/head/bin/ospMPIDistribTutorialVolume
Also for the MPI Distributed applications, you don't need to pass --osp:load-modules=mpi --osp:device=mpiOffload
, as it will explicitly load the MPI module (https://github.com/ospray/ospray/blob/master/modules/mpi/tutorials/ospMPIDistribTutorialSpheres.cpp#L54) and use the mpiDistributed
device for data parallel rendering (https://github.com/ospray/ospray/blob/master/modules/mpi/tutorials/ospMPIDistribTutorialSpheres.cpp#L57).
The MPIDistributed examples show distributed data rendering, where the data is too large to fit on one node and is distributed over multiple nodes. The distributed applications are assumed to be MPI-aware, and take care of not opening a window on the worker ranks for example (as the tutorials do).
The mpiOffload
device is for the opposite case, where the data can fit on each node and we just want to scale up compute. MPI offload applications don't actually need to know anything about MPI, you can scale up an application written for local rendering by just swapping out some command line parameters passed to ospInit
. To try out offload rendering you can run:
mpirun --host pc0,pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 -x LD_LIBRARY_PATH /piconfs/systems/OSPray/head/bin/ospExamples --osp:load-modules=mpi --osp:device=mpiOffload
Offload works transparently with a local rendering application (like ospExamples) by swapping out the device used in ospInit and turning ranks 1+ into workers.
My MPI cluster are ARM64 Jetson Nano. OIDN is not supported for the it. Is there a way to tell OSPray (e.g. ospExamples) not to look for them. Just so that I can test out the mpiOffload to verify.
picocluster@pc0:~$ /piconfs/systems/OSPray/head/bin/ospExamples
OSPRay error: could not open module lib ospray_module_denoiser: /piconfs/systems/rkcommon/1.7.0/lib/libospray_module_denoiser.so: cannot open shared object file: No such file or directory
FYI, I tried out the call to the volume rendering example, it does not error out with the glfw3 error.
However, I screen does not draw (blank), I waited for about a minute and killed the process.
FYI, I have build and run the NPB code from NAS so I know the MPI cluster does work.
Does the app exit after failing to find the denoiser module? That should be configured to not exit, it should just disable the denoiser option in the app GUI.
Actually, in testing the ospMPIDistributedVolume app myself, it seems to be stuck when running on 2+ ranks. I'll take a look at what's going on there, it looks like a bug somewhere.
Yes, the application still runs even when it cannot find the denoiser but I am not sure if the return value may affects how MPI may interpret them.
I have success with ospMPIDistribTutorialReplicated
mpirun --host pc0,pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 -x LD_LIBRARY_PATH /piconfs/systems/OSPray/head/bin/ospMPIDistribTutorialReplicated
I can see reasonable performance increase
At least I know OSPray+MPI does work on my Jetson Nano cluster.
I think ospExamples should still exit with error code 0 when it doesn't find the denoiser so it should be ok w/ MPI. That's great to hear ospMPIDistributedTutorialReplicated works and scales!
I'll take a look at what's going on with the distributed rendering side of things, which is probably also related to #496
This should be resolved in our 2.7.1 release: https://github.com/ospray/ospray/releases/tag/v2.7.1 , please let us know if you run into any issues
Hi,
I have built the MPI examples and was wondering what is the way to run them.
I have a cluster of 4 machines pc1,pc2,pc3,pc4. The head node where I am launching it is pc0
Do I use mpirun or are the examples designed to parse -host or -hostfile ?
Cheers