Closed totaam closed 10 years ago
add-csc-opencl.patch
(13.7 KiB)stub opencl csc module
add-csc-opencl-v3.patch
(19.7 KiB)minor tweaks
More kernels we may be able to use:
- image_formats.cl from socles (GPL v3)
Testing with plain x264 command line (running a couple of times to ensure the values are consistent - they are..):
OpenCL
enabled:$ time ./x264 --opencl -o opencl.x264 video.mp4 lavf [info]: 720x404p 0:1 @ 24000/1001 fps (vfr) x264 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT x264 [info]: OpenCL acceleration enabled with NVIDIA Corporation GeForce GTS 450 x264 [info]: profile High, level 3.0 x264 [info]: frame I:364 Avg QP:15.09 size: 37254 x264 [info]: frame P:10936 Avg QP:20.31 size: 5108 x264 [info]: frame B:19868 Avg QP:23.11 size: 772 x264 [info]: consecutive B-frames: 10.2% 11.5% 8.4% 69.9% x264 [info]: mb I I16..4: 29.4% 17.4% 53.2% x264 [info]: mb P I16..4: 2.0% 2.6% 3.3% P16..4: 11.9% 6.5% 4.6% 0.0% 0.0% skip:69.2% x264 [info]: mb B I16..4: 0.1% 0.1% 0.2% B16..8: 8.5% 2.2% 0.8% direct: 0.7% skip:87.4% L0:48.4% L1:45.2% BI: 6.5% x264 [info]: 8x8 transform intra:28.0% inter:27.9% x264 [info]: coded y,uvDC,uvAC intra: 37.6% 57.8% 45.3% inter: 3.7% 4.7% 2.0% x264 [info]: i16 v,h,dc,p: 64% 27% 8% 2% x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 19% 15% 58% 1% 1% 1% 1% 1% 2% x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 30% 22% 23% 4% 4% 4% 5% 4% 4% x264 [info]: i8c dc,h,v,p: 51% 24% 22% 4% x264 [info]: Weighted P-Frames: Y:1.1% UV:1.0% x264 [info]: ref P L0: 64.6% 7.0% 17.6% 10.7% 0.1% x264 [info]: ref B L0: 79.6% 17.1% 3.3% x264 [info]: ref B L1: 95.0% 5.0% x264 [info]: kb/s:521.59
encoded 31168 frames, 175.77 fps, 521.59 kb/s
real 2m57.650s user 10m12.278s sys 0m36.051s
- without `OpenCL`:
$ time ./x264 -o no-opencl.x264 video.mp4 lavf [info]: 720x404p 0:1 @ 24000/1001 fps (vfr) x264 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT x264 [info]: profile High, level 3.0 x264 [info]: frame I:373 Avg QP:16.18 size: 36484
x264 [info]: frame P:12582 Avg QP:20.97 size: 4720 x264 [info]: frame B:18213 Avg QP:23.12 size: 681 x264 [info]: consecutive B-frames: 17.9% 10.8% 5.7% 65.7% x264 [info]: mb I I16..4: 23.1% 24.5% 52.5% x264 [info]: mb P I16..4: 1.6% 2.4% 2.8% P16..4: 11.8% 6.5% 4.5% 0.0% 0.0% skip:70.5% x264 [info]: mb B I16..4: 0.1% 0.1% 0.2% B16..8: 7.6% 1.9% 0.7% direct: 0.6% skip:88.8% L0:47.1% L1:46.3% BI: 6.6% x264 [info]: 8x8 transform intra:31.5% inter:27.4% x264 [info]: coded y,uvDC,uvAC intra: 36.9% 56.1% 43.1% inter: 3.9% 4.9% 2.1% x264 [info]: i16 v,h,dc,p: 61% 29% 8% 2% x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 15% 58% 1% 1% 1% 1% 1% 2% x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 30% 22% 22% 4% 4% 4% 5% 4% 4% x264 [info]: i8c dc,h,v,p: 51% 23% 22% 4% x264 [info]: Weighted P-Frames: Y:0.8% UV:0.7% x264 [info]: ref P L0: 64.9% 6.8% 17.7% 10.5% 0.0% x264 [info]: ref B L0: 78.6% 18.1% 3.3% x264 [info]: ref B L1: 95.4% 4.6% x264 [info]: kb/s:525.55encoded 31168 frames, 186.50 fps, 525.55 kb/s
real 2m47.235s user 10m10.138s sys 0m6.067s
Resulting files:
$ du -sk *opencl.x264 83404 no-opencl.x264 82776 opencl.x264
So this doesn't look like it makes much of a difference unfortunately (at least on my `GTS 450`), if anything it is a tad slower. The one thing where this may still be useful is for motion detection, where we could increase the search diameter without incurring too much more CPU usage. Enabling it looks simple enough, in `x264.h`:
int b_opencl; / use OpenCL when available /
(assuming that x264 is built with opencl support)
For the record, this is what I had to do to get
pyopencl
to build onFedora
19 with the nvidia SDK to avoid this error at import time:ImportError: /usr/lib/python2.7/dist-packages/pyopencl/_cl.so: \ symbol clRetainDevice, version OPENCL_1.2 not defined in file libOpenCL.so.1 with link time reference
The existing headers look like this:
$ ls -la /usr/include/CL lrwxrwxrwx. 1 root root 32 Aug 28 12:39 /usr/include/CL -> /etc/alternatives/opencl-headers
Edit: Just downgrading the version of
opencl-headers
to 1.1 is enough.
Alternatively, we can move the headers to a version specific directory and add the
OpenCL
1.1 headers:cd /etc/alternatives/ mv opencl-headers opencl-headers-1.2 mkdir opencl-headers-1.1 ln -sf opencl-headers-1.1 opencl-headers cd opencl-headers-1.1 wget http://www.khronos.org/registry/cl/api/1.1/cl_gl_ext.h wget http://www.khronos.org/registry/cl/api/1.1/cl_ext.h wget http://www.khronos.org/registry/cl/api/1.1/cl_gl_ext.h wget http://www.khronos.org/registry/cl/api/1.1/cl_gl.h wget http://www.khronos.org/registry/cl/api/1.1/cl.h wget http://www.khronos.org/registry/cl/api/1.1/cl_platform.h wget http://www.khronos.org/registry/cl/api/1.1/opencl.h
Then we need to ensure
pyopengl
will be built against 1.1, sositeconf.py
contains:CL_PRETEND_VERSION = '1.1'
Having installed freeocl, I now have 3 providers available:
$ LD_LIBRARY_PATH=/opt/cuda/lib64/ XPRA_SWSCALE_DEBUG=0 PYTHONPATH=. python ./tests/xpra/codecs/test_csc_opencl.py PyOpenCL OpenGL support: True found 3 OpenCL platforms: * FreeOCL (FreeOCL developers) - 1 devices: + CPU: AMD Phenom(tm) II X4 945 Processor (OpenCL 1.2 FreeOCL-0.3.6 / OpenCL C 1.2) * NVIDIA CUDA (NVIDIA Corporation) - 1 devices: + GPU: GeForce GTS 450 (OpenCL 1.1 CUDA / OpenCL C 1.1 ) * Intel(R) OpenCL (Intel(R) Corporation) - 1 devices: + CPU: AMD Phenom(tm) II X4 945 Processor (OpenCL 1.2 (Build 67279) / OpenCL C 1.2 )
add-csc-opencl-v6.patch
(22.7 KiB)works ok but only one format so far: YUV420P to RGB
Please try the patch above and report on performance. You may need to adjust some env vars for finding the libraries in the cuda paths and for selecting the opencl platform/device:
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/cuda/lib64/ export PYTHONPATH=. XPRA_OPENCL_DEVICE_TYPE=GPU python ./tests/xpra/codecs/test_csc_opencl.py XPRA_OPENCL_DEVICE_TYPE=CPU python ./tests/xpra/codecs/test_csc_opencl.py
Note: careful with
LD_LIBRARY_PATH
, putting cuda ahead of regular libraries can cause some serious problems (conflicts with libopencl versions for example).[[BR]]
Results deleted (those figures were wrong because of a bug)
The results aren't as bad as they look for nvidia:
- cpu csc is already very fast since it is such as simple operation
- hopefully the difference will be more noticeable when we add scaling
- the gfx card is quite slow by modern standards (we'll see if faster ones help - not guaranteed it will make a huge difference here since the cost is mostly memory bandwidth)
- most of the cpu time is spent copying buffers to and from the gfx card and on modern cpus that is slightly better than doing fpu or more general instruction decoding
Even then, I think there is room for improvement since we copy the pixels in and out and we may not need to (we just need a buffer interface).
Interestingly, the performance varies widely depending on the picture size.. will need to look into the worksize/localsize settings.
add-csc-opencl-v7.patch
(23.0 KiB)updated patch - fix crash with swscale
Here are the results on Nvidia K1 (Nvidia) OpenCL
At 1920x1080 191 MPixels/s 223 MPixels/s 161 MPixels/s 184 MPixels/s 172 MPixels/s
add-csc-opencl-v10.patch
(17.3 KiB)working version with all yuv formats as input and both BGRX and RGBX as output
Please re-run with patch v10 which fixes some important bugs.
I am afraid that I cannot commit it as-is because the
OpenCL
shared libraries we end up loading cause some serious problems:Traceback (most recent call last): File "/usr/bin/xpra", line 6, in <module> sys.exit(xpra.scripts.main.main(__file__, sys.argv)) File "/usr/lib64/python2.7/site-packages/xpra/scripts/main.py", line 432, in main return run_server(parser, options, mode, script_file, args) File "/usr/lib64/python2.7/site-packages/xpra/scripts/server.py", line 454, in run_server import gtk.gdk #@Reimport File "/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/__init__.py", line 40, in <module> from gtk import _gtk ImportError: dlopen: cannot load any more object with static TLS
add-csc-opencl-v13.patch
(35.6 KiB)updated patch with support for RGB to YUV444P (and more to come)
Added support in r4247
According to Recommended 8-Bit YUV Formats for Video Rendering (section on "YUV Sampling"), MPEG2's subsampling code (BT.601) is more lazy than MPEG1's - but since
OpenCL
is so cheap to run (it is the memory transfers that cost us), I went for the MPEG1-like more exhaustive calculations instead (using an average of all source pixel values).Still have to figure out the TLS issue before this can be of any use..
Testing on a dual Xeon E5-2670 with dual
NVidia
K1s (more results [/wiki/CSC here]), I found that the individual K1 GPU cores are actually slower than my GTS 450 and so usingOpenCL
with x264 actually makes it run slower (and I believe the CPU savings are not worth much either):
- without
OpenCL
:encoded 3347 frames, 148.74 fps, 1853.13 kb/s
real 0m22.759s user 6m40.754s sys 0m7.133s
* with `OpenCL`:
encoded 3347 frames, 89.80 fps, 1866.38 kb/s
real 0m46.335s user 4m42.685s sys 0m26.054s
The TLS issue has been solved in r4282 by only properly initializing csc_opencl (getting a context) after we have loaded GTK... which works around the problem rather than solving it properly.
OpenCL is now enabled (r4298) and working well so closing this ticket.
Note: we may still want some enhancements:
There were many more changes and tweaks (too many to list).
Note: the TLS issue is discussed here on the PyOpenCL mailing list. Looks like a
PyOpenCL
build issue - may need to revisit when testing with theNvidia
SDK which only supportsOpenCL
1.1 ...
Just found that the the AMD icd causes the client to get into a spin and waste CPU on a spinlock. Simply having the AMD icd in
/etc/OpenCL/vendors
is enough to trigger the problem, soOpenCL
should probably be disabled by default to prevent this. What is really odd is that this only affects the client, the server will happily run with the AMD icd (you can force it to be used with:XPRA_FORCE_CSC_MODE=YUV420P XPRA_CSC_TYPE=opencl xpra start ...
) We cannot do a runtime check as calling anyOpenCL
API will cause the loader to dlopen the problematic library.. and we're toast.Beware: one cannot strace the xpra client (the machine locks up - need ssh to come and kill the strace process)
Here's what strace has to say:
open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 10 read(10, "0-7\n", 8192) = 4 close(10) = 0 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f78e9007000 mprotect(0x7f78e9007000, 4096, PROT_NONE) = 0 clone(Process 2797 attached <unfinished ...> [pid 2797] set_robust_list(0x7f78e98079e0, 24 <unfinished ...> [pid 2655] <... clone resumed> child_stack=0x7f78e9806fb0, \ flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, \ parent_tidptr=0x7f78e98079d0, tls=0x7f78e9807700, child_tidptr=0x7f78e98079d0) = 2797 [pid 2797] <... set_robust_list resumed> ) = 0 [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff <unfinished ...> [pid 2655] ioctl(9, 0x4008642a <unfinished ...> [pid 2797] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2655] <... ioctl resumed> , 0x7fff7aabbb08) = 0 [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff <unfinished ...> [pid 2655] ioctl(9, 0xc03064a6 <unfinished ...> [pid 2797] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 2797] futex(0x347b040, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {0, 1000000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
The futex call repeats forever and the xpra client process consumes >70% CPU doing absolutely nothing.
And another one for good measure, Intel this time, is doing an illegal memory access, caught with valgrind:
==27195## Invalid read of size 827195## at 0x118DDA1C: __intel_sse2_strrchr (in /opt/intel/opencl-1.2-3.0.67279/lib64/libtbb_preview.so.2)27195## by 0x118C8531: tbb::internal::init_dl_data() (dynamic_link.cpp:290)27195## by 0x118C8466: __sti__$E (dynamic_link.cpp:449)27195## by 0x118E8001: ??? (in /opt/intel/opencl-1.2-3.0.67279/lib64/libtbb_preview.so.2)27195## by 0x118C367A: ??? (in /opt/intel/opencl-1.2-3.0.67279/lib64/libtbb_preview.so.2)27195## by 0x7FF000276: ???27195## by 0x6E6F687479702E: ???27195## by 0x6E69622F7273752E: ???27195## by 0x746100617270782E: ???27195## by 0x652D2D0068636173: ???27195## by 0x3D676E69646F636D: ???27195## by 0x6E2D2D0034363267: ???27195## Address 0xec4c5d8 is 56 bytes inside a block of size 58 alloc'd27195## at 0x4A06409: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)27195## by 0x3452405C95: open_path (dl-load.c:2036)27195## by 0x34524086DC: _dl_map_object (dl-load.c:2223)27195## by 0x345240CAD1: openaux (dl-deps.c:63)27195## by 0x345240F303: _dl_catch_error (dl-error.c:177)27195## by 0x345240D1D1: _dl_map_object_deps (dl-deps.c:256)27195## by 0x34524138BB: dl_open_worker (dl-open.c:265)27195## by 0x345240F303: _dl_catch_error (dl-error.c:177)27195## by 0x34524131EA: _dl_open (dl-open.c:656)27195## by 0x3452C0102A: dlopen_doit (dlopen.c:66)27195## by 0x345240F303: _dl_catch_error (dl-error.c:177)27195== by 0x3452C0162C: _dlerror_run (dlerror.c:163)
I have added the most important setup and configuration information here: CSC and the performance data now lives here: CSC
There are new
SDK
s available:
- Intel SDK XE 2013 R2 - which I am unable to test on my AMD CPU, can you please check that it still runs OK and maybe add or update the [/wiki/CSC/Performance performance data] (hopefully they will have fixed the invalid 64-bit memory access from comment:15 - if you have time, run the minimal opencl tests under valgrind)
- AMD APP SDK v2.9 - and I can no longer reproduce the client problems.
[[BR]]
Maybe this can be enabled by default server side?
I don't think we will ever bother using
OpenCL
ornvcuda
(#384) for CSC on the client side, since we're better off usingOpenGL
for CSC, scaling and rendering (it is now stable enough to use).
I've tested the Intel, AMD and Nvidia OpenCL ICD's and tested with no problem however there is an issue with the AMD ICD which prevents Xorg from receiving a kill signal. Even just having this ICD available seems to be enough to trigger it.
I'm going to work from a clean install and try to find a set of instructions that includes all the above info to install the Intel + Nvidia ICD's on Fedora 20 to work with xpra.
I've just hit this error:
clFinish failed: invalid command queue
After a computer suspend-resume, it seems that the context becomes invalid (must have been cleared from the GPU during suspend). r5110 fixes that.
[[BR]]
Quite likely to affect nvenc (added to #466) and csc_nvcuda (added to #384)
Trying to test with AMD OpenCL using HD 6870 GPU
Getting some strange output is this normal?
using new OpenCL context YUV420P to BGRX at 1920x1080 : 90 MPixels/s using new OpenCL context using new OpenCL context using new OpenCL context using new OpenCL context YUV420P to RGBX at 1920x1080 : 128 MPixels/s using new OpenCL context using new OpenCL context using new OpenCL context using new OpenCL context YUV422P to BGRX at 1920x1080 : 113 MPixels/s using new OpenCL context using new OpenCL context using new OpenCL context using new OpenCL context YUV422P to RGBX at 1920x1080 : 131 MPixels/s using new OpenCL context using new OpenCL context using new OpenCL context using new OpenCL context YUV444P to BGRX at 1920x1080 : 141 MPixels/s using new OpenCL context using new OpenCL context using new OpenCL context using new OpenCL context YUV444P to RGBX at 1920x1080 : 112 MPixels/s
Seems to be starting many new contexts.
Tested a few suspend/resume with r5153 with an ATI HD6870 and no issue.
2014-01-08 17:55:44,912 PyOpenCL loaded, header version: 1.2, GL support: False 2014-01-08 17:55:44,913 using platform: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) 2014-01-08 17:55:44,913 using device: GPU: Barts (OpenCL 1.2 AMD-APP (1348.4) / OpenCL C 1.2 )
Fore more info
From comment:20: that's odd, are you not seeing any
using new OpenCL context
after suspend/resume as I was? (I will try an intel chipset too) The patch [/attachment/ticket/422/opencl-forcewait.patch] makes it easier to hit the context problems: adding a 10 second delay in the encoding so that we can more easily suspend a PC whilst the GPU context is active.[[BR]]
Also, the log from comment:19 is worrying: the context should not have changed during the same run and I don't see how it could.. r5154 will tell us what has changed (the context or "program"), if you still get multiple occurrences of
using new OpenCL context
during the test run, please run the test withXPRA_OPENGL_DEBUG=1
and post the lines preceding these ones, they should read something like:old program=(..), new program=(..)
orold context=(..), new context=(..)
.
opencl-forcewait.patch
(0.5 KiB)introduces a 10 second delay in the encoding to make it easier to suspend with a live context
For comment:20
init_context(..) channel order=RGBA, filter mode=NEAREST init_context(..) kernel_function RGB_to_YUV422P: <pyopencl._cl.Kernel object at 0x3300628> old program=<pyopencl.Program object at 0x2e21510>, new program=<pyopencl.Program object at 0x2e21510> using new OpenCL context (program changed) init_context(..) kernel source=
opencl-programcompare.patch
(0.9 KiB)try to use the underlying int_ptr to compare opencl program instances
What the? the programs are clearly the same... yet fail the comparison test.
Looks like the docs are wrong: pyopencl.Program:
Instances of this class are hashable, and two instances of this class may be compared using “==” and ”!=”. (Hashability was added in version 2011.2.)
(unless you are using an outdated version ofPyOpenCL
?)Can you please try once more with [/attachment/ticket/422/opencl-programcompare.patch] to see if the spurious
using new OpenCL context
still occur? (and post your version of thePyOpenCL
package) The easy alternative, would be to remove the program test altogether, I have manually verified that we always re-initialize the programs when we re-initialize the device so this would be safe, for now. But this would make the code much more brittle.
Odd pyopencl seems to be installed 32 bit??
Using
/usr/lib/python2.7/site-packages/pyopencl-2013.2-py2.7-linux-x86_64.egg
I installed this witheasy_install -Z pyopencl
I may have to do it by hand we'll see.I applied your patch and they seem to be all gone now.
OK, I'll try to produce a test case to report the bug to
PyOpenCL
, which I will have to ask you to test for me since I can't reproduce this weirdness. In the meantine, r5157 merges the workaround with a long comment explaining its purpose.FYI:
/usr/lib/python2.7/site-packages/
can contain both 32-bit and 64-bit extensions..
Thanks for the clarification. I'll update the performance chart with my numbers from this machine and a quick instruction set for being able to run it.
AMD drivers require some extra stuff like exporting
COMPUTE=:0
so I assume you actually have to have an X server running?That said I think we've tried out opencl_csc on several platforms now and several opencl ICD's
Install AMD OpenCL on Fedora 20
I did this from a fresh install with LXDE
From a root terminal
yum group install "Development Tools"; yum install kernel-devel opencl-headers gcc-c++ cd /tmp wget http://www2.ati.com/drivers/beta/amd-catalyst-13.11-betaV9.95-linux-x86.x86_64.zip unzip amd-catalyst-13.11-betaV9.95-linux-x86.x86_64.zip chmod +x Install-AMD-APP.sh; ./Install-AMD-APP.sh
I chose to do an express install. It may ask you to reboot I chose to do this after I installed the AMD App SDK.
Download AMD-APP-SDK-v2.9-lnx64.tgz from http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/downloads/
tar xfvz ../AMD-APP-SDK-v2.9-lnx64.tgz ./Install-AMD-App.sh
I rebooted after this install and proceed to install pyopencl with easyinstall
easy_install -Z pyopencl
Started and tested xpra with this command line
COMPUTE=:0 XPRA_OPENCL_DEVICE_TYPE=GPU xpra --no-daemon --bind-tcp=0.0.0.0:1300 --start-child="xterm -fg white -bg black" start :13
Issue migrated from trac ticket # 422
component: core | priority: major | resolution: fixed
2013-08-26 08:45:01: totaam created the issue