CDAT / vcs

Visualization Control System
19 stars 13 forks source link

VCS 2.8 vs 2.10 speed issues #210

Closed danlipsa closed 7 years ago

danlipsa commented 7 years ago

From @zshaheen: In the ACME Workflow telco this Monday, June 26, 2017, we discussed issues regarding VCS 2.8 vs 2.10. In short, when using 2.10, isofills and polar projections are fairly slow. Sadly, 1D plots are also slower on 2.10 than 2.8.

Code Here's the code that we use to create the plots: https://github.com/zshaheen/acme_diags/tree/master/acme_diags/plot/vcs set3.py are the 1D plots, set4.py and set5.py are the isofills, set7.py is the polar projections. Look for the plot() function.

Envs Also here are the envs used: 2.8 env: vcs-nox 2.8.2017.04.04.16.48.c22e1d924a0145ef526b56a8ca123484adfd660a 0 uvcdat/label/nightly vcsaddons-nox 2.8 py27_2 uvcdat vtk-cdat-nox 7.1.0.2.8.2017.03.23.16.48.73fb094d17ccdd594a6abf9ff8732149c88dc2e6 py27_0 uvcdat/label/nightly mesalib 7.6.1 0 conda-forge dv3d-nox 2.8 py27_2 uvcdat

2.10 env: vcs 2.10 2 uvcdat vtk-cdat 7.1.0.2.10 py27_mesalib_0 [mesalib] uvcdat mesalib 17.0.3 1 conda-forge dv3d 2.10 py27_0 uvcdat

Results Now we ran 10 of each plots and timed them. Below are the results: 1D plots (set3.py) is 2.256 times slower on 2.10 than 2.8. This is even with the fix. This isn't so great. Isofills (set4.py and set5.py) are 1.247 and 4.803 times slower respectively on 2.10 than 2.8. This also isn't so great. Polar projections (set7.py) is 3.236 times slower on 2.10 than 2.8. Also, this isn't so great.

Any ideas? We really appreciate your guy's help on this.

From @doutriaux1

Guys,

I got smething.

Using the attached test and the yml env at: https://github.com/UV-CDAT/uvcdat/wiki/install

NOTE this is done w/o mesalib and ALL 2.10 vtk-cdat do NOT render at the correct geometry (bug for you Aashish and Dan)

isofill with 2.8: 0.18433380127 Max percentage 119.696913801 Fastest time: 0.180957794189 Avg time: 0.194927251339 Fit coeff: -0.00128569424834 1.12106129005

isofill 2.8 and vcs nightly Max percentage 136.229411851 Fastest time: 0.212167024612 Avg time: 0.241551113129 Fit coeff: -0.000293949286773 1.13474387474

isofill with 2.10 Max percentage 162.54471857 Fastest time: 0.217200994492 Avg time: 0.243273427486 Fit coeff: 0.000875383643704 1.06942886163

isofill 2.10 but vcs nightly and vtk-cdat from 2.10 Max percentage 109.646274636 Fastest time: 0.244299173355 Avg time: 0.255787761211 Fit coeff: -0.00021131210507 1.0437535676

isofill l2.10 vcs and vtk-cdat from nightly Max percentage 155.385235308 Fastest time: 0.256790161133 Avg time: 0.26979429245 Fit coeff: -9.85009650669e-05 1.04153697489

speed_test.py.txt

danlipsa commented 7 years ago

I tried vcs 2.8, vcs nightly + VTK OpenGL backend (default), and vcs nightly + VTK OpenGL2 backend. . This is what I got:

vcs nightly + VTK OpenGL2 backend

vtk-cdat 7.1.0.2.10.2017.07.04.08.42.1c03d1d03b1c12d99dd090ba4c5ca078e2d9612f py27_mesalib_0 [mesalib] local mesalib 17.0.3 1 conda-forge Avg time: 0.813000631332

vcs nightly + VTK OpenGL backend

vtk-cdat 7.1.0.2.10.2017.06.21.11.56.1c03d1d03b1c12d99dd090ba4c5ca078e2d9612f py27_mesalib_0 [mesalib] uvcdat/label/nightly mesalib 17.0.3 1 conda-forge Avg time: 1.46560789108

2.8 + VTK OpenGL backend

vtk-cdat-nox 7.1.0.2.8 mesalib 7.6.1 2 uvcdat/label/2.8 Avg time: 0.275944304466

danlipsa commented 7 years ago

@doutriaux1 @aashish24 One problem is that we started to compile VTK in debug mode (2.8 was compiled in Release mode). We have to change this. However, there still seems to be a significant slowdown remaining, at least for isofill.

vcs nightly + VTK Release OpenGL backend

vtk-cdat 7.1.0.2.10.2017.07.04.09.37.1c03d1d03b1c12d99dd090ba4c5ca078e2d9612f py27_mesalib_0 [mesalib] local mesalib 17.0.3 1 conda-forge Avg time: 1.39770002365

danlipsa commented 7 years ago

@doutriaux1 @aashish24 The slowdown issue we see seems to be caused by

  1. We compile VTK in Debug mode in nightly (and maybe 2.10 as well) and in Release mode in 2.8
  2. The bigger slowdown seems to be caused by mesa. Note we upgraded from mesalib 7.6.1 to mesalib 17.0.3. It may be how we compile it as well.

Here are the 4 runs with VCS 2.8, VCS Master, VTK 2.8 VTK Master. They all have similar times. To identify VCS, look for Start plot. This is printed from VTKPlots.py To identify VTK look for XXX Render. This is printed from vtkRenderWindow.cxx.

VCS 2.8, VTK Master

Master Render: 0.0259284
Master Render: 0.0218634
Start plot 2.8
gtype:  isofill
Start plot 2.8
gtype:  text
Render time: 0.000003
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000002
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000002
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000000
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  fillarea
Render time: 0.000000
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Create pipeline: 0.215682
Master Render: 0.998379
Render time: 0.998429
1.21853399277
Master Render: 0.992069
Master Render: 0.917881
Master Render: 0.0189741

VCS 2.8, VTK 2.8

2.8 Render: 0.0247734
2.8 Render: 0.0229599
Start plot 2.8
gtype:  isofill
Start plot 2.8
gtype:  text
Render time: 0.000007
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  text
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000000
Start plot 2.8
gtype:  text
Render time: 0.000000
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000001
Start plot 2.8
gtype:  fillarea
Render time: 0.000001
Start plot 2.8
gtype:  line
Render time: 0.000000
Start plot 2.8
gtype:  text
Render time: 0.000000
Create pipeline: 0.206618
2.8 Render: 0.959998
Render time: 0.960048
1.17007303238
2.8 Render: 0.921597
2.8 Render: 0.903757
2.8 Render: 0.0189879

VCS Master, VTK 2.8

2.8 Render: 0.0263435
2.8 Render: 0.0260806
Start plot master
gtype:  isofill
Start plot master
gtype:  text
Render time: 0.000003
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000000
Start plot master
gtype:  text
Render time: 0.000000
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  fillarea
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000000
Create pipeline: 0.256525
2.8 Render: 0.973659
Render time: 0.973718
1.23442101479
2.8 Render: 0.961294
2.8 Render: 0.946593
2.8 Render: 0.0201555

VCS Master, VTK Master

Master Render: 0.0249178
Master Render: 0.0248235
Start plot master
gtype:  isofill
Start plot master
gtype:  text
Render time: 0.000003
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000002
Start plot master
gtype:  text
Render time: 0.000003
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000000
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  fillarea
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000000
Start plot master
gtype:  text
Render time: 0.000001
Create pipeline: 0.254846
Master Render: 0.93995
Render time: 0.939997
1.19984102249
Master Render: 0.917993
Master Render: 0.919241
Master Render: 0.0201933
danlipsa commented 7 years ago

This is what is used by VTK: VMware, Inc. 3.0 Mesa 17.0.3 Gallium 0.4 on softpipe I think we want llvmpipe instead of softpipe. https://www.mesa3d.org/osmesa.html

danlipsa commented 7 years ago

Here are the results with mesa compiled with llvmpipe:

VMware, Inc.
3.0 Mesa 17.0.3
Gallium 0.4 on llvmpipe (LLVM 3.3, 256 bits)
Master Render: 0.0087492
VMware, Inc.
3.0 Mesa 17.0.3
Gallium 0.4 on llvmpipe (LLVM 3.3, 256 bits)
Master Render: 0.00377673
Start plot master
gtype:  isofill
Start plot master
gtype:  text
Render time: 0.000005
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000002
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000000
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000000
Start plot master
gtype:  line
Render time: 0.000000
Start plot master
gtype:  fillarea
Render time: 0.000001
Start plot master
gtype:  line
Render time: 0.000001
Start plot master
gtype:  text
Render time: 0.000001
Create pipeline: 0.259305
Master Render: 0.271823
Render time: 0.271868
0.535116910934
Master Render: 0.0628871
Master Render: 0.0485962
Master Render: 0.00186366
aashish24 commented 7 years ago

Master Render: 0.0628871 Master Render: 0.0485962 Master Render: 0.00186366

Looking great to me!

danlipsa commented 7 years ago

@aashish24 Note that we are not yet to where the old mesa was: 0.275944304466 vs 0.535116910934

danlipsa commented 7 years ago

@aashish24 @doutriaux1 Removing the printouts and running the test that does the average I get: Avg time: 0.287473192215 Which is approx what we got for 2.8.

aashish24 commented 7 years ago

this is great. So a difference of 2% I would think is within the range. @doutriaux1 what you think?

doutriaux1 commented 7 years ago

@danlipsa i will try to update the official mesa build on conda-forge and tsee the difference it makes.

doutriaux1 commented 7 years ago

copy/pasting script test for easier re-use:

import vcs
import time
import numpy
import cdms2

f=cdms2.open(vcs.sample_data+"/clt.nc")
s=f("clt")
x=vcs.init()
x.open()
x.geometry(1600,800)

def isSlowingDown(gmtype):
    n = 100
    print "Testing slow down for:",gmtype
    maxpct = 100.
    fastest = 100000.
    avg = 0.
    times = []
    for i in range(n):
        start = time.time()
        gm = vcs.creategraphicsmethod(gmtype)
        #gm.datawc_x1 = 0.
        #gm.datawc_x2 = 2.
        #gm.datawc_y1 = 0.
        #gm.datawc_y2 = 2.
        x.plot(gm,s)
        end = time.time()
        elapsed = end - start
        if elapsed < fastest:
            fastest = elapsed
        pct = (elapsed/fastest*100.)
        times.append(elapsed)
        if i>5 and pct>maxpct: # skip the first 5 times to make sure system gets in groovy mode
            maxpct = pct
        avg += elapsed
        x.clear()
        x.removeobject(gm)
        print elapsed
    a, b = numpy.polyfit(numpy.arange(n-5),numpy.array(times[5:])/fastest,1)
    print "\tMax percentage",maxpct
    print "\tFastest time:",fastest
    print "\tAvg time:",avg/n
    print "\tFit coeff:",a,b
    if abs(a)>2.E-3:
        return True
    return False
isSlowingDown("isofill")
jypeter commented 7 years ago

@doutriaux1 what do you mean by groovy mode? You wait till you think that there is a good chance that things are cached?

doutriaux1 commented 7 years ago

@jypeter yes so that the file system cached everything, memory is allocated by process, etc...

doutriaux1 commented 7 years ago

@danlipsa I uploaded nightly vtk in release mode rather than debug

with a pure 2.10 env:

0.242112874985
0.23365187645
    Max percentage 141.900919912
    Fastest time: 0.23365187645
    Avg time: 0.252162251472
    Fit coeff: -0.000105017047031 1.07336193973

with the exact same env but using the Release vtk :

0.210036039352
    Max percentage 110.823273177
    Fastest time: 0.19441318512
    Avg time: 0.209133522511
    Fit coeff: -0.000138462622074 1.07028311157

with pure 2.8 I get:

0.204612970352
    Max percentage 167.436051597
    Fastest time: 0.184792995453
    Avg time: 0.227097587585
    Fit coeff: -0.00171937779642 1.29636997825

So the 2.10 avg time is actually faster!!!!

doutriaux1 commented 7 years ago

interestingly with mesalib 17.1.4 (or 17.0.3) and vtk-cdat in release i get:

1.19244384766
    Max percentage 115.738468471
    Fastest time: 1.15388703346
    Avg time: 1.18699487686
    Fit coeff: -4.48654672987e-05 1.0289543097

Are we expecting mesa to be much slower?

doutriaux1 commented 7 years ago

@williams13 maybe the mesa slowness is what makes vcdat so slow.

danlipsa commented 7 years ago

Mesa (rendering in software) is much slower than rendering in hardware.

doutriaux1 commented 7 years ago

ok so it makes sense. Thanks @danlipsa

doutriaux1 commented 7 years ago

@zshaheen for the amwg_diags then do not use mesa

zshaheen commented 7 years ago

@doutriaux1 If we don't use mesa, then will it work on a headless machine? none nox used to give an openGL error.

durack1 commented 7 years ago

Just to throw a cat amongst the pidgeons.. is Mesa the only way?

https://www.phoronix.com/scan.php?page=news_item&px=Intel-OpenSWR-Mesa

@danlipsa have you heard or tried this, or whatever it may have morphed into by now?

@doutriaux1https://github.com/doutriaux1 If we don't use mesa, then will it work on a headless machine? none nox used to give an openGL error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/UV-CDAT/vcs/issues/210#issuecomment-313837211, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADFHwG0D2Qd7XtDlAumAWPvlb6plN1Crks5sLx4igaJpZM4ONWPy.

danlipsa commented 7 years ago

@durack1 @doutriaux1 @aashish24 I think OpenSWR works only with on-screen mesa. We are already using that for ParaView. https://blog.kitware.com/messing-with-mesa-for-paraview-5-0vtk-7-0/

An (much faster) option for off-screen mesa is EGL. This does not need X, but it does need a recent (three years old) NVidia graphics card and recent Linux. It works fine on my Ubuntu 16.04. https://blog.kitware.com/off-screen-rendering-through-the-native-platform-interface-egl/

doutriaux1 commented 7 years ago

thanks @danlipsa.

doutriaux1 commented 7 years ago

@danlipsa also it appears that it as already been ported back into mesa: https://github.com/OpenSWR/openswr-mesa and then: https://cgit.freedesktop.org/mesa/mesa/ 17.1.4 which are using (with swrast) has been out 10days. @danlipsa should I revivie my efforts to use llvmpipe? Looking at it it appears using --disable-llvm-shared-libs might help us.

doutriaux1 commented 7 years ago

@danlipsa I tried a local build and regular mesa gives us:

    Max percentage 117.257370576
    Fastest time: 1.19739794731
    Avg time: 1.2479244256
    Fit coeff: -0.000892191236294 1.08191431583

but llvm enabled gives us:

    Max percentage 127.831469762
    Fastest time: 0.22727894783
    Avg time: 0.240919532776
    Fit coeff: 0.000148631887712 1.03445848746

Which is very close from a non-mesa build! I'll try again a bit this morning, I'm hoping disabling shared-llvm libs will help.

danlipsa commented 7 years ago

@doutriaux1 @sankhesh Yes, we should use llvmpipe. What is the status here? Where are you blocked?

doutriaux1 commented 7 years ago

builds pass and conda packages are uploaded but so far mac segfault when installed back on my mac. Trying diable-shared-llvm now to see if it helps.

doutriaux1 commented 7 years ago

good news these number are from a fresh new mesa build using mesa produced by conda-forge:

Also I had a webex running at same time, so clean numbers might be faster

    Max percentage 139.571263624
    Fastest time: 0.280382156372
    Avg time: 0.312298326492
    Fit coeff: -0.000609585616416 1.12265862954
doutriaux1 commented 7 years ago

@danlipsa thx for the mesa-llvm this is HUGE!!

danlipsa commented 7 years ago

Nice! Thanks also to @zshaheen who filed the bug report @doutriaux1 who created the performance test and made everything work on all platforms including circleci and travis and @sankhesh who created the original mesa conda receipe and made mesa work with VTK on mac.

durack1 commented 7 years ago

@danlipsa @doutriaux1 @zshaheen @sankhesh this is great, I wonder if it might also miraculously sort out our ongoing VNC issues?? It'll be great to give this a try..

doutriaux1 commented 7 years ago

@durack1 there was never an issue with mesa, since it is what we recommend over X11 forwarding.