Eyescale / Equalizer

Equalizer is the standard middleware to create and deploy parallel OpenGL-based applications. It enables applications to benefit from multiple graphics cards, processors and computers to scale the rendering performance, visual quality and display size. An Equalizer application runs unmodified on any visualization system, from a simple workstation to large scale graphics clusters, multi-GPU workstations and Virtual Reality installations.
http://eyescale.github.io/equalizergraphics.com/
Other
225 stars 102 forks source link

HWLOC: Segmentation Fault with empty auto thread affinity mask #167

Open bohara opened 12 years ago

bohara commented 12 years ago

In case of a machine with two CPU processors (nodes) and three GPUs, such that two of the GPUs are connected to one processor and the third GPU connected to the other processor, when I allocate three GPUs but only one processor, seems like all possible GPU-CPU connections cannot be resolved and it causes a Segmentation Fault.
You can replicate the error when you allocate the cluster resources ( using Slurm ) with the command below:

salloc -N1 -n1 --gres=gpu:3 -p interactive -t 2:00:00,

and run eqPly from examples. As discussed in meeting this issue is directed towards Marwan.

eile commented 12 years ago

I think this is the bug Marwan should fix soonish anyways, where the intersection of 'affinity GPUs' and 'allocated GPUs' is 0.

eile commented 11 years ago

Marwan - please verify this bug and take appropriate action.

marwan-abdellah commented 11 years ago

Ok,

On Fri, May 17, 2013 at 12:46 PM, Stefan Eilemann notifications@github.comwrote:

Marwan - please verify this bug and take appropriate action.

— Reply to this email directly or view it on GitHubhttps://github.com/Eyescale/Equalizer/issues/167#issuecomment-18054843 .

marwan-abdellah commented 11 years ago

@eile : Do you know where can I find Bidurs' equalizer configs ?

eile commented 11 years ago

Autoconf should take care of this. You simply need an allocation where CPU intersection GPU affinity is empty.

marwan-abdellah commented 11 years ago

Well, can't reproduce the bug.

I try to use the same allocation command "salloc -N1 -n1 --gres=gpu:3 -p interactive" Then I am running the X server for the 3 GPUs "srun -n1 --gres=gpu:3 -w bbplxviz07 --startx --pty /bin/bash" Then, vglconnect"ing" to the node being allocated "vglconnect bbplxvizXX" Then running eqPly. It doesn't give any segmentation faults.

26163 PipeDraw c/Equalizer/eq/client/pipe.cpp:199 45 Entered pipe thread 26163 PipeDraw c/Equalizer/eq/client/pipe.cpp:310 45 Set up pipe message pump for GLX 26163 PipeDraw c/Equalizer/eq/client/pipe.cpp:338 45 Get Automatic Affinity for 0 26163 PipeDraw c/Equalizer/eq/client/pipe.cpp:344 45 port, device = 4294967295,4294967295 26163 PipeDraw c/Equalizer/eq/client/pipe.cpp:349 45 port, device = 4294967295,4294967295 No Affinity 26163 PipeDraw eq/client/glx/windowSystem.cpp:54 45 Using glx::Pipe 26163 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:199 46 Entered pipe thread 26163 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:310 46 Set up pipe message pump for GLX 26163 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:338 46 Get Automatic Affinity for 1 26163 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:344 46 port, device = 0,1 26163 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:199 57 Entered pipe thread 26163 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:310 57 Set up pipe message pump for GLX 26163 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:338 57 Get Automatic Affinity for 2 26163 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:344 57 port, device = 0,2 26163 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:402 92 For [port, device] = 0,1 : GPU is found. 26163 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:402 112 For [port, device] = 0,2 : GPU is found.

marwan-abdellah commented 11 years ago

Connecting to the node and vglrun eqPly for the first time works fine. However, trying to rerun it after that, "Aborted". I get the following output, 26855 PipeDraw c/Equalizer/eq/client/pipe.cpp:338 23 Get Automatic Affinity for 0 26855 PipeDraw c/Equalizer/eq/client/pipe.cpp:344 23 port, device = 4294967295,4294967295 26855 PipeDraw c/Equalizer/eq/client/pipe.cpp:349 23 port, device = 4294967295,4294967295 No Affinity 26855 PipeDraw eq/client/glx/windowSystem.cpp:54 23 Using glx::Pipe 26855 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:199 24 Entered pipe thread 26855 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:310 24 Set up pipe message pump for GLX 26855 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:338 24 Get Automatic Affinity for 1 26855 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:344 24 port, device = 0,1 26855 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:199 31 Entered pipe thread 26855 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:310 31 Set up pipe message pump for GLX 26855 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:338 31 Get Automatic Affinity for 2 26855 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:344 31 port, device = 0,2 26855 PipeDraw1 c/Equalizer/eq/client/pipe.cpp:402 65 For [port, device] = 0,1 : GPU is found. 26855 PipeDraw2 c/Equalizer/eq/client/pipe.cpp:402 79 For [port, device] = 0,2 : GPU is found. 26855 PipeDraw1 eq/client/glx/windowSystem.cpp:54 95 Using glx::Pipe 26855 PipeDraw2 eq/client/glx/windowSystem.cpp:54 99 Using glx::Pipe XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 41 requests (41 known processed) with 0 events remaining. XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" after 41 requests (41 known processed) with 0 events remaining. 26855 PipeDraw2 ox/lunchbox/pluginRegistry.cpp:99 111 Assert: plugins.empty() [Plugin registry not de-initialized] , in: lunchbox::abort() lunchbox::detail::PluginRegistry::~PluginRegistry() lunchbox::PluginRegistry::~PluginRegistry() /lib64/libc.so.6(exit+0xe2) [0x2acce4401da2] _XDefaultIOError _XIOError _XReply /usr/lib64/nvidia/libGL.so.1(+0xb7f39) [0x2accaf299f39] 26855 PipeDraw2 rc/Lunchbox/lunchbox/debug.cpp:44 111 Aborted (core dumped)

This bug is inconsistent.

eile commented 11 years ago

This is the actual error:

XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 41 requests (41 known processed) with 0 events remaining.

To debug, put xlib into synchronous mode and run in gdb to see which x call is causing this. It might be caused by the affinity stuff.

marwan-abdellah commented 11 years ago

Disabling the automatic affinity by forcing the Pipe::_getAutoAffinity() to return lunchbox::Thread::NONE gives the same in consistent bug.

11257  Equalizer/eq/server/server.cpp:196 3073 
11257        Main alizer/eq/client/cvTracker.cpp:43 16 Did not find OpenCV camera 0
11257    PipeDraw c/Equalizer/eq/client/pipe.cpp:200 20 Entered pipe thread
11257    PipeDraw c/Equalizer/eq/client/pipe.cpp:311 20 Set up pipe message pump for GLX
11257    PipeDraw eq/client/glx/windowSystem.cpp:54 20 Using glx::Pipe
11257   PipeDraw1 c/Equalizer/eq/client/pipe.cpp:200 21 Entered pipe thread
11257   PipeDraw1 c/Equalizer/eq/client/pipe.cpp:311 21 Set up pipe message pump for GLX
11257   PipeDraw1 eq/client/glx/windowSystem.cpp:54 21 Using glx::Pipe
11257   PipeDraw2 c/Equalizer/eq/client/pipe.cpp:200 22 Entered pipe thread
11257   PipeDraw2 c/Equalizer/eq/client/pipe.cpp:311 22 Set up pipe message pump for GLX
11257   PipeDraw2 eq/client/glx/windowSystem.cpp:54 22 Using glx::Pipe
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 27 requests (27 known processed) with 0 events remaining.
11257   PipeDraw2 ox/lunchbox/pluginRegistry.cpp:99 38 Assert: plugins.empty() [Plugin registry not de-initialized] , in: 
    lunchbox::abort()
    lunchbox::detail::PluginRegistry::~PluginRegistry()
    lunchbox::PluginRegistry::~PluginRegistry()
    /lib64/libc.so.6(exit+0xe2) [0x2b7a2df34da2]
    _XDefaultIOError
    _XIOError
    _XReply
    /usr/lib64/nvidia/libGL.so.1(+0xb7f39) [0x2b79f8dccf39]
11257   PipeDraw2 rc/Lunchbox/lunchbox/debug.cpp:44 38 
Aborted (core dumped)

To reproduce the bug, use the allocation described before

salloc -N1 -n1 --gres=gpu:3 -p interactive  

Then use srun to run the Xserver

srun -n1 --gres=gpu:3 -w NODE --startx --pty /bin/bash

The vglConnect to the node

vglconnect NODE

Then run eqPly