OpenKinect / libfreenect2

Open source drivers for the Kinect for Windows v2 device
2.08k stars 752 forks source link

Minimum hardware requirements (general question, not a bug or an issue) #251

Closed anshulvj closed 9 years ago

anshulvj commented 9 years ago

I'm currently testing the Kinect v2 to see how low of a hardware will suffice to get the depth image, in somewhat-real-time :) (wanna put it on a ground robot for navigation problems). I'm using a small form factor pc which is giving me depth (it doesnt have a dedicated graphics card) but it's really slow. It's dropping a lot of frames and updating the depth image once a second with significant lag (I've commented out the rgb and registration part of the code to reduce the bottlenecks).

I am aware that the Kinect v2 requires significant computational power to account for the cool ToF system it has underneath, but if a minimum hardware threshold is known, I can go and do the needful to get that. The requirements on MS website are obviously more than enough to run it; a lower threshold should also work. I myself have tried Kinect v2 with libfreenect2 on an i5 with 8 GB ram and integrated graphics card (small tower) and it works very nicely.

So I wanted the authors' idea (if you know anything top-off-your-head) on what is the minimum hardware requirements with which the kinect v2 can perform in real-time (i.e. I don't need 30 fps per say, even 5fps depth map updation is ok). I understand if there's no black and white answer to this.

Currently I'm using a small form factor embedded PC with the following configuration:

Intel® Celeron® N2930 1.83 GHz details RAM: 2GB (I've also tried with 8GB with only marginal improvement) USB 3.0

I have commented out the registration part of the code but I don't see any improvement. I'm not that aware of, and have never coded for the drivers for Kinect V2. If someone has suggestions on whether this performance can be improved (either by modifying some section of this code, or by adding better hardware) I'd appreciate it.

Thanks Anshul

xlz commented 9 years ago

An interesting question. I think it is very promising to make it work.

It is Intel's CPU with HD Graphics GPU. You can enable OpenCL for depth processing and VAAPI (#210) for RGB processing. I'm not sure of the driver support in OpenCL and VAAPI for your platform. My guess is 30Hz for depth and 60Hz for RGB.

If you have progress in enabling hardware acceleration on your platform, I'm interested in hearing it below.

kohrt commented 9 years ago

I would guess one of the newer atoms should work. Under windows they work with the SDK, but not at 30 Hz. The HD graphics is supported by beignet, if I remember correctly. Main difference, ivy bridge has 16 EU, haswell 20 EU and the atom has only 4 EU with lower frequencies. So it is much slower. If you want only depth, then you should disable the streams for color and use OpenCL for the depth processing. This should save you much CPU load for your addition navigation stuff.

anshulvj commented 9 years ago

Thank you all. I will look into Beignet / Opencl today for my board and see what I can do for my rover, and I'll post comments if I hit major hurdles. (attached: meet Symmbot :). The xtion works gr8 currently, but I wanna replace it with kinect ) 20150421_200638-1

anshulvj commented 9 years ago

I installed opencl (Beignet) following instructions here. I verified that opencl has been installed correctly by using clinfo. I also ran the utests and only one test failed:

compiler_overflow_sub_uint4() [FAILED] Error: ((T*)buf_data[2])[i].x == max at file /home/anshulvj/Downloads/beignet/utests/compiler_overflow.cpp, function test, line 96

summary:

total: 700 run: 700 pass: 698 fail: 1 pass rate: 0.998571

But after I run the Protonect example with cl (or gl) argument, it says "OpenCL pipeline is not supported!" and proceeds to show a blank depth image. According to the aforementioned link, I've set all the environment variables given in setenv.sh so ideally the code should be able to see that there is opencl available. So I went into the code to see if I can change anything to make it work. The code fails in Protonect.cpp at:

pipeline = new libfreenect2::OpenCLPacketPipeline();

so I checked packet_pipeline.cpp but I don't see what change will help with detecting opencl. Is there any change that will help detect opencl. The output of clinfo is as follows, if it helps:

anshulvj@ubuntuboard:~/Downloads/beignet/build/utests$ clinfo Number of platforms 2 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.0.3 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd Platform Extensions function suffix Intel

Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 1.2 beignet 1.0.2 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd Platform Extensions function suffix Intel

Platform Name Intel Gen OCL Driver Number of devices 1 Device Name Intel(R) HD Graphics Bay Trail-T Device Vendor Intel Device Vendor ID 0xf31 Device Version OpenCL 1.2 beignet 1.0.3 Driver Version 1.0.3 Device OpenCL C Version OpenCL C 1.2 beignet 1.0.3 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 1000MHz Device Partition (core) Max number of sub-devices 1 Supported partition types None, None, None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 16 Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a) float 4 / 4
double 0 / 2 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 2147483648 (2GiB) Error Correction support No Max memory allocation 1073741824 (1024MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:85: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 8192x8192 pixels Max 3D image size 8192x8192x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 65536 (64KiB) Max constant buffer size 134217728 (128MiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 80(null) Execution capabilities
Run OpenCL kernels Yes Run native kernels Yes Prefer user sync for interop Yes printf() buffer size 1048576 (1024KiB) Built-in kernels cl_copy_region_align4;__cl_copy_region_align16;cl_cpy_region_unalign_same_offset;cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;cl_copy_buffer_rect;cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;cl_copy_image_3d_to_2d;cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;cl_copy_image_2d_to_buffer;cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;cl_copy_buffer_to_image_3d;cl_fill_region_unalign;__cl_fill_region_align2;cl_fill_region_align4;cl_fill_region_align8_2;cl_fill_region_align8_4;cl_fill_region_align8_8;cl_fill_region_align8_16;cl_fill_region_align128;__cl_fill_image_1d;cl_fill_image_1d_array;__cl_fill_image_2d;cl_fill_image_2d_array;cl_fill_image_3d; Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd

Platform Name Intel Gen OCL Driver Number of devices 1 Device Name Intel(R) HD Graphics Bay Trail-T Device Vendor Intel Device Vendor ID 0xf31 Device Version OpenCL 1.2 beignet 1.0.2 Driver Version 1.0.2 Device OpenCL C Version OpenCL C 1.2 beignet 1.0.2 Device Type GPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 1000MHz Device Partition (core) Max number of sub-devices 1 Supported partition types None, None, None Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 16 Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a) float 4 / 4
double 0 / 2 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 2147483648 (2GiB) Error Correction support No Max memory allocation 1073741824 (1024MiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size <printDeviceInfo:85: get CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : error -30> Global Memory cache line 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Max 2D image size 8192x8192 pixels Max 3D image size 8192x8192x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 65536 (64KiB) Max constant buffer size 134217728 (128MiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 80(null) Execution capabilities
Run OpenCL kernels Yes Run native kernels Yes Prefer user sync for interop Yes printf() buffer size 1048576 (1024KiB) Built-in kernels cl_copy_region_align4;__cl_copy_region_align16;cl_cpy_region_unalign_same_offset;cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;cl_copy_buffer_rect;cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;cl_copy_image_3d_to_2d;cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;cl_copy_image_2d_to_buffer;cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;cl_copy_buffer_to_image_3d;cl_fill_region_unalign;__cl_fill_region_align2;cl_fill_region_align4;cl_fill_region_align8_2;cl_fill_region_align8_4;cl_fill_region_align8_8;cl_fill_region_align8_16;cl_fill_region_align128;__cl_fill_image_1d;cl_fill_image_1d_array;__cl_fill_image_2d;cl_fill_image_2d_array;cl_fill_image_3d; Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_icd

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel Gen OCL Driver clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [Intel] clCreateContext(NULL, ...) [default] Success [Intel] clCreateContext(NULL, ...) [other] Success [Intel] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel Gen OCL Driver Device Name Intel(R) HD Graphics Bay Trail-T clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel Gen OCL Driver Device Name Intel(R) HD Graphics Bay Trail-T

ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.2.3 ICD loader Profile OpenCL 1.2 Inconsistency detected by ld.so: dl-close.c: 764: _dl_close: Assertion `map->l_init_called' failed! anshulvj@ubuntuboard:~/Downloads/beignet/build/utests$

xlz commented 9 years ago

OpenCL pipeline is not supported! means CMake did not detect OpenCL. What is your CMake output?

kohrt commented 9 years ago

You have to reconfigure and recompile libfreenect2 in order to make it see the changes. When you run cmake there will be output regarding OpenCL detection.

Edit: There is also a ppa with a binary release of Beignet, so you don't need to compile it yourself. https://github.com/code-iai/iai_kinect2#opencl-with-intel-gpu

anshulvj commented 9 years ago

Boy that was dumb of me!! I recompiled libfreenect2 and Yay!!! It worked. I'm getting really real time speeds on depth images using cl argument :+1: :) :)

Kinect v2 working on such a small form factor fan-less cpu opens up whole new possibilities for me!

Thanks for all your help. And I'd be happy to share what I did if it helps someone else.

kohrt commented 9 years ago

Could you provide us with some numbers or logs? Are you using VAAPI for the color stream?

anshulvj commented 9 years ago

I haven't looked at VAAPI yet because currently I'm only interested in depth images; I've only worked with libfreenect2 till now. I commented out the registration part of protonect.cpp and disabled display of rgb, ir and registration. I only display depth. Are there any logs being written already as a part of libfreenect2 protonect, or can be generated using some other commands? I looked in the folders and I din't see any. I can surely try to code up some tic-toc's and see how many depth fps it's generating (or if you need any different kind of numbers please let me know what you need). The "skipping depth packet" and "skipping rgb packet" messages are still being thrown out. The output of running protonect is as follows:

anshulvj@ubuntuboard:~/Downloads/libfreenect2/examples/protonect$ sudo ./bin/Protonect cl [Freenect2Impl] enumerating devices... [Freenect2Impl] 7 usb devices connected [Freenect2Impl] found valid Kinect v2 @2:3 with serial 031084540347 [Freenect2Impl] found 1 devices [OpenCLDepthPacketProcessor::listDevice] devices: 0: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] 1: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] [OpenCLDepthPacketProcessor::init] selected device: Intel(R) HD Graphics Bay Trail-T (GPU)[Intel] [Freenect2DeviceImpl] opening... [Freenect2DeviceImpl] opened [Freenect2DeviceImpl] starting... [Freenect2DeviceImpl] ReadData0x14 response 92 bytes of raw data 0x0000: 00 00 12 00 00 00 00 00 01 00 00 00 43 c1 1f 41 2e2e2e2e2e2e2e2e2e2e2e2e432e2e41 0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e 0x0020: 0a 21 33 55 c2 00 17 20 00 08 00 00 10 00 00 00 2e2133552e2e2e202e2e2e2e2e2e2e2e 0x0030: 00 01 00 00 00 10 00 00 00 00 80 00 01 00 00 00 2e2e2e2e2e2e2e2e2e2e802e2e2e2e2e 0x0040: 31 33 00 00 00 02 0e 04 47 4b 50 32 31 30 2e 31 31332e2e2e2e2e2e474b503231302e31 0x0050: 58 00 00 00 00 00 00 00 07 00 00 00 582e2e2e2e2e2e2e2e2e2e2e

[Freenect2DeviceImpl] ReadStatus0x090000 response 4 bytes of raw data 0x0000: 01 26 00 00 2e262e2e

[Freenect2DeviceImpl] ReadStatus0x090000 response 4 bytes of raw data 0x0000: 03 26 00 00 2e262e2e

[Freenect2DeviceImpl] enabling usb transfer submission... [Freenect2DeviceImpl] submitting usb transfers... [Freenect2DeviceImpl] started device serial: 031084540347 device firmware: 4.1.3911.0.7 [DepthPacketStreamParser::onDataReceived] not all subsequences received 0 [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [RgbPacketStreamParser::onDataReceived] skipping rgb packet! [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [RgbPacketStreamParser::onDataReceived] skipping rgb packet! [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet

xlz commented 9 years ago

Commenting out imshow is not enough to disable color decoding. It is still using TurboJPEG by default and it is hurting your depth processing. We don't have a depth only test case. Change this line to int r = 0; to disable color decoding.

Performance is printed after a while like this [TurboJpegRgbPacketProcessor] avg. time: 20.4993ms -> ~48.7822Hz.

anshulvj commented 9 years ago

I disabled TurboJPEG as you suggested, recompiled and ran, and following is my output:

[TurboJpegRgbPacketProcessor] avg. time: 0.12335ms -> ~8107Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.156844ms -> ~6375.77Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . .

[DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [OpenCLDepthPacketProcessor] avg. time: 49.1186ms -> ~20.3589Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.147889ms -> ~6761.83Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.135202ms -> ~7396.32Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [OpenCLDepthPacketProcessor] avg. time: 49.4062ms -> ~20.2404Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . . [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet [TurboJpegRgbPacketProcessor] avg. time: 0.132742ms -> ~7533.42Hz [DepthPacketStreamParser::onDataReceived] skipping depth packet [DepthPacketStreamParser::onDataReceived] skipping depth packet . .

xlz commented 9 years ago

20Hz for depth, not bad.

anshulvj commented 9 years ago

I was curious if any future update of libfreenect2 is going to have point cloud extraction from depth image. If not, then I can try to code up something.

xlz commented 9 years ago

@anshulvj https://github.com/OpenKinect/libfreenect2/issues/41#issuecomment-59919674

anshulvj commented 9 years ago

Thanks. I looked at the post and the libfreenect2 registration code. Currently I only need x,y,z values per depth pixel (I might need color as well, in which case the registration code does that already, but I'm taking a step at a time). Based on my knowledge of image formation and camera calibration, we already have the z depth ("z_raw" in registration.cpp). Once we undistort depth pixel and get the correct x,y coordinate (in the Registration constructor in undistort_depth()), we already have the x,y,z coordinate per point. I might just add a "hook" in the function to store this as a Nx3 vector and use that to "plot3" the point cloud (planning to use RANSAC) to segment it.

floe commented 9 years ago

If you have an idea how to solve the point cloud issue "properly", please do submit a pull request. I think PCL has a quasi-standard format for point cloud data, maybe we could use that? Little warning, however: the registration code will change a bit when I merge PR #253.

anshulvj commented 9 years ago

My basic idea to solve this comes from my computer vision class and what wiedemeyer mentioned in the post I was pointed to: Undistort depth pixels -> covert from image frame to camera frame (that gives us the true x and y w.r.t camera frame) -> We already know the z value for the original pixel and this gives us the x,y,z per pixel.

I'm using cv::vector currently (I'm a Matlab guy, so coding in C++ for me is a bureaucratic nightmare :) ). But I did it anyway. Maybe there is a better "correct" way to code this up; pcl's point cloud format might be the way to go and I can certainly try to incorporate that.

I spent today modifying the registration code; added a function that uses dx = (x - cx) / fx and dy = (y - cy) / fy to convert a pixel to x,y using a formula I found online (x = cx + width * dx and y = cx + height * dy ). I did not account for distortion yet since I wanted to see if this intermediate step was making any sense. However my x values range approx. from -130 to + 620 (y is also more off to one side of the origin). I don't know why this is happening but I'm planning to spend more time on that.

One explanation could be the orientation of camera frame. I'm considering the z axis going outward from kinect which is how it gives us depth; objects farther have increasing positive values (unlike textbook wisdom that states that -z axis goes outwards towards the world using the pinhole model). Hence in our case positive x should be on the left side of the image center, as per the right hand rule. In this case un-distortion formulas also need to account for the frame orientation before we calculate them else we would be violating the right-hand rule. The z depth makes sense and I tested it on measured objects. I will work today to account for this and see what it gives me. I can share the 3 files I changed, with you, if you need.

I haven't collaborated much on github much so I do not understand the merging/forking part that much. If I try anything fancy I might screw it up; so I'll do as you say w.r.t. that :)

kohrt commented 9 years ago

If you want to have point clouds, you could look at https://github.com/code-iai/iai_kinect2. The kinect2_viewer creates PCL point clouds from the depth and color image.

anshulvj commented 9 years ago

Thanks; I'm looking at it now. I have 15.04 (vivid) so I'll see if this works on ros jade turtle. Else I'll switch releases