TACC / pvOSPRay

Other
9 stars 4 forks source link

Client lose connection when switching to OSPRay rendering view #28

Open benjha opened 8 years ago

benjha commented 8 years ago

Hello,

I am running paraview in client/server mode in a Powerwall (ORNL's EVEREST), it has 9 nodes, each node has two GPUs, each GPU is connected to a display. The system is using intel compiler 14.0.4 and openMPI 1.8.4 with multithread support. I am aware that you recommend MPICH, nevertheless I can run the ospTestMPIAsyncMessaging example with openMPI, so I am wondering if it is another issue.

The job for the pvserver allocates 18 tasks and assigns 2 tasks per node to "steer" each GPU.

I am able to get up and running paraview and pvserver and the pvOSPRay plugin is loaded in both sides.

... Waiting for client... Connection URL: cs://everest1:22222 Accepting connection(s): everest1:22222 Client connected. ...

However when I change the rendering view to OSPRay, I can see a kind of switching in the powerwall and then the client gets disconnected from pvserver. For simplicity I am just adding the errors I got in the head node

ERROR: In /ccs/home/benjha/Programs/rayTracing/Paraview/VTK/Parallel/Core/vtkSocketCommunicator.cxx, line 878 vtkSocketCommunicator (0x23ea400): Tag mismatch: got 22222, expecting 1. [everest1:28234] * Process received signal * [everest1:28234] Signal: Segmentation fault (11) [everest1:28234] Signal code: Address not mapped (1) [everest1:28234] Failing at address: 0x20 [everest1:28234] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7fd366dae710] [everest1:28234] [ 1] /ccs/home/benjha/Programs/rayTracing/Paraview/build/bin/../lib//libpvOSPRay.so(_ZN15vtkPVOSPRayViewD2Ev+0x68)[0x7fd33a683aa2] [everest1:28234] [ 2] /ccs/home/benjha/Programs/rayTracing/Paraview/build/bin/../lib//libpvOSPRay.so(_ZN15vtkPVOSPRayViewD0Ev+0x18)[0x7fd33a683b18] [everest1:28234] [ 3] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN13vtkObjectBase18UnRegisterInternalEPS_i+0xe6)[0x7fd367849526] [everest1:28234] [ 4] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN9vtkObject18UnRegisterInternalEP13vtkObjectBasei+0x365)[0x7fd36784bb1f] [everest1:28234] [ 5] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(ZN13vtkObjectBase10UnRegisterEPS+0x34)[0x7fd3678493e8] [everest1:28234] [ 6] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN19vtkSmartPointerBaseD1Ev+0x45)[0x7fd367882863] [everest1:28234] [ 7] [everest1:28231] * Process received signal * [everest1:28231] Signal: Segmentation fault (11) [everest1:28231] Signal code: Invalid permissions (2) [everest1:28231] Failing at address: 0x36efa40 /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN19vtkSmartPointerBaseaSEP13vtkObjectBase+0x66)[0x7fd3678828cc] [everest1:28234] [ 8] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkPVServerImplementationCore-pv4.3.so.1(+0xcdc97)[0x7fd36d5cac97] [everest1:28234] [ 9] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkPVServerImplementationCore-pv4.3.so.1(_ZN10vtkSIProxy16DeleteVTKObjectsEv+0x21)[0x7fd36d5c972f] [everest1:28234] [10] [everest1:28231] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7f39c0c21710] [everest1:28231] [ 1] [0x36efa40] [everest1:28231] * End of error message * /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkPVServerImplementationCore-pv4.3.so.1(_ZN10vtkSIProxyD2Ev+0x2d)[0x7fd36d5c768d] [everest1:28234] [11] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkPVServerImplementationCore-pv4.3.so.1(_ZN10vtkSIProxyD0Ev+0x18)[0x7fd36d5c77b4] [everest1:28234] [12] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN13vtkObjectBase18UnRegisterInternalEPS_i+0xe6)[0x7fd367849526] [everest1:28234] [13] /ccs/home/benjha/Programs/rayTracing/Paraview/build/lib/libvtkCommonCore-pv4.3.so.1(_ZN9vtkObject18UnRegisterInternalEP13vtkObjectBasei+0x365)[0x7fd36784bb1f] [everest1:28234] [14] [everest6:23019] * Process received signal *

I am wondering what could be the problem, any help will be appreciated.

Let me know if you need additional information.

Thanks,

Benjamin Hernandez Advanced Data and Workflows Group ORNL

carsonbrownlee commented 8 years ago

Hi Benjamin, Sorry for my belated reply. Unfortunately we are currently very busy with SuperComputing preparations so I will be slow to respond, but I will look into this as soon as we can after SC! Is it crashing for you just when opening the view for the first time in a cold session with no data to display?

benjha commented 8 years ago

Thanks Carson, it is not too urgent.

Yes it crashes before loading any data. By default pvserver and the client uses OGL renderer, then I close that view and open an pvOSPray view and that's when it crashes. The same happens if I just want to have both views open.

I can switch between other views (e.g. parallel coordinate view, bar chart view, etc.) and return to OGL render view with no problems.

BTW. We have already experienced much better user experience on one node for a 13.5 million triangle mesh using pvOSPRay.

Regards,

Benjamin

On Wed, Nov 11, 2015 at 1:22 PM, carsonbrownlee notifications@github.com wrote:

Hi Benjamin, Sorry for my belated reply. Unfortunately we are currently very busy with SuperComputing preparations so I will be slow to respond, but I will look into this as soon as we can after SC! Is it crashing for you just when opening the view for the first time in a cold session with no data to display?

— Reply to this email directly or view it on GitHub https://github.com/TACC/pvOSPRay/issues/28#issuecomment-155867133.



B HDz benjha@gmail.com benjamin.hernandez@bsc.es

GregAbram commented 8 years ago

Hi, Benjamin - I'm going to look into this. You are using a power wall - so you are starting the server with the -tdx and -tdy parameters?

benjha commented 8 years ago

Hi Greg,

Yes just after pvserver command. It looks like

/sw/everest/paraview/${version}/bin/pvserver -sp=${port} ${tileopts} ${config}

where tileopts variable is defined as tileopts="-tdx=6 -tdy=3"

each node of the cluster has two GPUs thus, this is done for display :0.0 and :0.1 :

Regards,

Benjamin

On Thu, Dec 3, 2015 at 5:55 PM, Greg Abram notifications@github.com wrote:

Hi, Benjamin - I'm going to look into this. You are using a power wall - so you are starting the server with the -tdx and -tdy parameters?

— Reply to this email directly or view it on GitHub https://github.com/TACC/pvOSPRay/issues/28#issuecomment-161815269.



B HDz benjha@gmail.com benjamin.hernandez@bsc.es

GregAbram commented 8 years ago

Tracked it down on Friday - we'll get a fix checked in and tested and get back to you in a day or so.

carsonbrownlee commented 8 years ago

hi Benjamin, with Greg's help I committed a fix on the "dev" branch of pvOSPRay. Can you try that branch and see if it fixes the issues you were running into? Carson