Closed anshulvj closed 9 years ago
An interesting question. I think it is very promising to make it work.
It is Intel's CPU with HD Graphics GPU. You can enable OpenCL for depth processing and VAAPI (#210) for RGB processing. I'm not sure of the driver support in OpenCL and VAAPI for your platform. My guess is 30Hz for depth and 60Hz for RGB.
If you have progress in enabling hardware acceleration on your platform, I'm interested in hearing it below.
I would guess one of the newer atoms should work. Under windows they work with the SDK, but not at 30 Hz. The HD graphics is supported by beignet, if I remember correctly. Main difference, ivy bridge has 16 EU, haswell 20 EU and the atom has only 4 EU with lower frequencies. So it is much slower. If you want only depth, then you should disable the streams for color and use OpenCL for the depth processing. This should save you much CPU load for your addition navigation stuff.
Thank you all. I will look into Beignet / Opencl today for my board and see what I can do for my rover, and I'll post comments if I hit major hurdles. (attached: meet Symmbot :). The xtion works gr8 currently, but I wanna replace it with kinect )
I installed opencl (Beignet) following instructions here. I verified that opencl has been installed correctly by using clinfo. I also ran the utests and only one test failed:
compiler_overflow_sub_uint4() [FAILED] Error: ((T*)buf_data[2])[i].x == max at file /home/anshulvj/Downloads/beignet/utests/compiler_overflow.cpp, function test, line 96
summary:
total: 700 run: 700 pass: 698 fail: 1 pass rate: 0.998571
But after I run the Protonect example with cl (or gl) argument, it says "OpenCL pipeline is not supported!" and proceeds to show a blank depth image. According to the aforementioned link, I've set all the environment variables given in setenv.sh so ideally the code should be able to see that there is opencl available. So I went into the code to see if I can change anything to make it work. The code fails in Protonect.cpp at:
pipeline = new libfreenect2::OpenCLPacketPipeline();
so I checked packet_pipeline.cpp but I don't see what change will help with detecting opencl. Is there any change that will help detect opencl. The output of clinfo is as follows, if it helps:
anshulvj@ubuntuboard:~/Downloads/beignet/build/utests$ clinfo Number of platforms 2 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.0.3 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd Platform Extensions function suffix Intel
Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.0.2 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd Platform Extensions function suffix Intel
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) HD Graphics Bay Trail-T
Device Vendor Intel
Device Vendor ID 0xf31
Device Version OpenCL 1.2 beignet 1.0.3
Driver Version 1.0.3
Device OpenCL C Version OpenCL C 1.2 beignet 1.0.3
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size <printDeviceInfo:85: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Global
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 80(null)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Prefer user sync for interop Yes
printf() buffer size 1048576 (1024KiB)
Built-in kernels cl_copy_region_align4;__cl_copy_region_align16;cl_cpy_region_unalign_same_offset;cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;cl_copy_buffer_rect;cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;cl_copy_image_3d_to_2d;cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;cl_copy_image_2d_to_buffer;cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;cl_copy_buffer_to_image_3d;cl_fill_region_unalign;__cl_fill_region_align2;cl_fill_region_align4;cl_fill_region_align8_2;cl_fill_region_align8_4;cl_fill_region_align8_8;cl_fill_region_align8_16;cl_fill_region_align128;__cl_fill_image_1d;cl_fill_image_1d_array;__cl_fill_image_2d;cl_fill_image_2d_array;cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) HD Graphics Bay Trail-T
Device Vendor Intel
Device Vendor ID 0xf31
Device Version OpenCL 1.2 beignet 1.0.2
Driver Version 1.0.2
Device OpenCL C Version OpenCL C 1.2 beignet 1.0.2
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 4
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size <printDeviceInfo:85: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30>
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Global
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 80(null)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Prefer user sync for interop Yes
printf() buffer size 1048576 (1024KiB)
Built-in kernels cl_copy_region_align4;__cl_copy_region_align16;cl_cpy_region_unalign_same_offset;cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;cl_copy_buffer_rect;cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;cl_copy_image_3d_to_2d;cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;cl_copy_image_2d_to_buffer;cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;cl_copy_buffer_to_image_3d;cl_fill_region_unalign;__cl_fill_region_align2;cl_fill_region_align4;cl_fill_region_align8_2;cl_fill_region_align8_4;cl_fill_region_align8_8;cl_fill_region_align8_16;cl_fill_region_align128;__cl_fill_image_1d;cl_fill_image_1d_array;__cl_fill_image_2d;cl_fill_image_2d_array;cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd
NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel Gen OCL Driver clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [Intel] clCreateContext(NULL, ...) [default] Success [Intel] clCreateContext(NULL, ...) [other] Success [Intel] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel Gen OCL Driver Device Name Intel(R) HD Graphics Bay Trail-T clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel Gen OCL Driver Device Name Intel(R) HD Graphics Bay Trail-T
ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.3 ICD loader Profile OpenCL 1.2 Inconsistency detected by ld.so: dl-close.c: 764: _dl_close: Assertion `map->l_init_called' failed! anshulvj@ubuntuboard:~/Downloads/beignet/build/utests$
OpenCL pipeline is not supported!
means CMake did not detect OpenCL. What is your CMake output?
You have to reconfigure and recompile libfreenect2 in order to make it see the changes. When you run cmake
there will be output regarding OpenCL detection.
Edit: There is also a ppa with a binary release of Beignet, so you don't need to compile it yourself. https://github.com/code-iai/iai_kinect2#opencl-with-intel-gpu
Boy that was dumb of me!! I recompiled libfreenect2 and Yay!!! It worked. I'm getting really real time speeds on depth images using cl argument :+1: :) :)
Kinect v2 working on such a small form factor fan-less cpu opens up whole new possibilities for me!
Thanks for all your help. And I'd be happy to share what I did if it helps someone else.
Could you provide us with some numbers or logs? Are you using VAAPI for the color stream?
I haven't looked at VAAPI yet because currently I'm only interested in depth images; I've only worked with libfreenect2 till now. I commented out the registration part of protonect.cpp and disabled display of rgb, ir and registration. I only display depth. Are there any logs being written already as a part of libfreenect2 protonect, or can be generated using some other commands? I looked in the folders and I din't see any. I can surely try to code up some tic-toc's and see how many depth fps it's generating (or if you need any different kind of numbers please let me know what you need). The "skipping depth packet" and "skipping rgb packet" messages are still being thrown out. The output of running protonect is as follows:
anshulvj@ubuntuboard:~/Downloads/libfreenect2/examples/protonect$ sudo ./bin/Protonect cl [Freenect2Impl] enumerating devices... [Freenect2Impl] 7 usb devices connected [Freenect2Impl] found valid Kinect v2 @2:3 with serial 031084540347 [Freenect2Impl] found 1 devices [OpenCLDepthPacketProcessor::listDevice] devices: 0: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] 1: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] [OpenCLDepthPacketProcessor::init] selected device: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] [Freenect2DeviceImpl] opening... [Freenect2DeviceImpl] opened [Freenect2DeviceImpl] starting... [Freenect2DeviceImpl] ReadData0x14 response 92 bytes of raw data 0x0000: 00 00 12 00 00 00 00 00 01 00 00 00 43 c1 1f 41 2e2e2e2e2e2e2e2e2e2e2e2e432e2e41 0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e 0x0020: 0a 21 33 55 c2 00 17 20 00 08 00 00 10 00 00 00 2e2133552e2e2e202e2e2e2e2e2e2e2e 0x0030: 00 01 00 00 00 10 00 00 00 00 80 00 01 00 00 00 2e2e2e2e2e2e2e2e2e2e802e2e2e2e2e 0x0040: 31 33 00 00 00 02 0e 04 47 4b 50 32 31 30 2e 31 31332e2e2e2e2e2e474b503231302e31 0x0050: 58 00 00 00 00 00 00 00 07 00 00 00 582e2e2e2e2e2e2e2e2e2e2e
[Freenect2DeviceImpl] ReadStatus0x090000 response 4 bytes of raw data 0x0000: 01 26 00 00 2e262e2e
[Freenect2DeviceImpl] ReadStatus0x090000 response 4 bytes of raw data 0x0000: 03 26 00 00 2e262e2e
[Freenect2DeviceImpl] enabling usb transfer submission... [Freenect2DeviceImpl] submitting usb transfers... [Freenect2DeviceImpl] started device serial: 031084540347 device firmware: 4.1.3911.0.7 [DepthPacketStreamParser::onDataReceived] not all subsequences received 0 [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [RgbPacketStreamParser::onDataReceived] skipping rgb packet! [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [RgbPacketStreamParser::onDataReceived] skipping rgb packet! [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet
Commenting out imshow
is not enough to disable color decoding. It is still using TurboJPEG by default and it is hurting your depth processing. We don't have a depth only test case. Change this line to int r = 0;
to disable color decoding.
Performance is printed after a while like this [TurboJpegRgbPacketProcessor] avg. time: 20.4993ms -> ~48.7822Hz
.
I disabled TurboJPEG as you suggested, recompiled and ran, and following is my output:
[TurboJpegRgbPacketProcessor] avg. time: 0.12335ms -> ~8107Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.156844ms -> ~6375.77Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . .
[DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [OpenCLDepthPacketProcessor] avg. time: 49.1186ms -> ~20.3589Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.147889ms -> ~6761.83Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.135202ms -> ~7396.32Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [OpenCLDepthPacketProcessor] avg. time: 49.4062ms -> ~20.2404Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.132742ms -> ~7533.42Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . .
20Hz for depth, not bad.
I was curious if any future update of libfreenect2 is going to have point cloud extraction from depth image. If not, then I can try to code up something.
Thanks. I looked at the post and the libfreenect2 registration code. Currently I only need x,y,z values per depth pixel (I might need color as well, in which case the registration code does that already, but I'm taking a step at a time). Based on my knowledge of image formation and camera calibration, we already have the z depth ("z_raw" in registration.cpp). Once we undistort depth pixel and get the correct x,y coordinate (in the Registration constructor in undistort_depth()), we already have the x,y,z coordinate per point. I might just add a "hook" in the function to store this as a Nx3 vector and use that to "plot3" the point cloud (planning to use RANSAC) to segment it.
If you have an idea how to solve the point cloud issue "properly", please do submit a pull request. I think PCL has a quasi-standard format for point cloud data, maybe we could use that? Little warning, however: the registration code will change a bit when I merge PR #253.
My basic idea to solve this comes from my computer vision class and what wiedemeyer mentioned in the post I was pointed to: Undistort depth pixels -> covert from image frame to camera frame (that gives us the true x and y w.r.t camera frame) -> We already know the z value for the original pixel and this gives us the x,y,z per pixel.
I'm using cv::vector
I spent today modifying the registration code; added a function that uses dx = (x - cx) / fx and dy = (y - cy) / fy to convert a pixel to x,y using a formula I found online (x = cx + width * dx and y = cx + height * dy ). I did not account for distortion yet since I wanted to see if this intermediate step was making any sense. However my x values range approx. from -130 to + 620 (y is also more off to one side of the origin). I don't know why this is happening but I'm planning to spend more time on that.
One explanation could be the orientation of camera frame. I'm considering the z axis going outward from kinect which is how it gives us depth; objects farther have increasing positive values (unlike textbook wisdom that states that -z axis goes outwards towards the world using the pinhole model). Hence in our case positive x should be on the left side of the image center, as per the right hand rule. In this case un-distortion formulas also need to account for the frame orientation before we calculate them else we would be violating the right-hand rule. The z depth makes sense and I tested it on measured objects. I will work today to account for this and see what it gives me. I can share the 3 files I changed, with you, if you need.
I haven't collaborated much on github much so I do not understand the merging/forking part that much. If I try anything fancy I might screw it up; so I'll do as you say w.r.t. that :)
If you want to have point clouds, you could look at https://github.com/code-iai/iai_kinect2. The kinect2_viewer
creates PCL point clouds from the depth and color image.
Thanks; I'm looking at it now. I have 15.04 (vivid) so I'll see if this works on ros jade turtle. Else I'll switch releases
I'm currently testing the Kinect v2 to see how low of a hardware will suffice to get the depth image, in somewhat-real-time :) (wanna put it on a ground robot for navigation problems). I'm using a small form factor pc which is giving me depth (it doesnt have a dedicated graphics card) but it's really slow. It's dropping a lot of frames and updating the depth image once a second with significant lag (I've commented out the rgb and registration part of the code to reduce the bottlenecks).
I am aware that the Kinect v2 requires significant computational power to account for the cool ToF system it has underneath, but if a minimum hardware threshold is known, I can go and do the needful to get that. The requirements on MS website are obviously more than enough to run it; a lower threshold should also work. I myself have tried Kinect v2 with libfreenect2 on an i5 with 8 GB ram and integrated graphics card (small tower) and it works very nicely.
So I wanted the authors' idea (if you know anything top-off-your-head) on what is the minimum hardware requirements with which the kinect v2 can perform in real-time (i.e. I don't need 30 fps per say, even 5fps depth map updation is ok). I understand if there's no black and white answer to this.
Currently I'm using a small form factor embedded PC with the following configuration:
Intel® Celeron® N2930 1.83 GHz details RAM: 2GB (I've also tried with 8GB with only marginal improvement) USB 3.0
I have commented out the registration part of the code but I don't see any improvement. I'm not that aware of, and have never coded for the drivers for Kinect V2. If someone has suggestions on whether this performance can be improved (either by modifying some section of this code, or by adding better hardware) I'd appreciate it.
Thanks Anshul