Kinect / PyKinect2

Wrapper to expose Kinect for Windows v2 API in Python
MIT License
503 stars 236 forks source link

Merge These Example Programs Into The Main Codebase #79

Open Ddylen opened 4 years ago

Ddylen commented 4 years ago

This library can be somewhat difficult to understand how to use (its not immediately clear how to use the functions in it with only the C++ SDK as a guide (https://docs.microsoft.com/en-us/previous-versions/windows/kinect/dn799271(v=ieb.10)), so additional example programs would be useful for familiarising programmers with the workings of the library. I've included below basic demonstration programs on how to save kinect data, and get 3D positions from pixel values, in the hope that other new starters might find it useful. Would it be possible to incorporate them into the examples folder?

In addition, given this project does not seem to be actively developed, if anyone else has example programs they think new starters might find useful, commenting them here could help make this thread itself a useful resource for people trying to figure out how to use the library.

import numpy as np
import cv2
import pickle
import time 
import datetime

from pykinect2 import PyKinectV2
from pykinect2.PyKinectV2 import *
from pykinect2 import PyKinectRuntime

def save_frames(FILE_NAME):
    #records and saves colour and depth frames from the Kinect

    print("Saving colour and depth frames")

    # define file names
    depthfilename = "DEPTH." + FILE_NAME +".pickle"
    colourfilename = "COLOUR." + FILE_NAME +".pickle"
    depthfile = open(depthfilename, 'wb')
    colourfile = open(colourfilename, 'wb')

    #initialise kinect recording, and some time variables for tracking the framerate of the recordings
    kinect = PyKinectRuntime.PyKinectRuntime(PyKinectV2.FrameSourceTypes_Color | PyKinectV2.FrameSourceTypes_Depth)
    starttime = time.time()
    oldtime = 0
    i = 0
    fpsmax = 0
    fpsmin = 100

    display_type = "COLOUR"
    #display_type = "DEPTH"

    # Actual recording loop, exit by pressing escape to close the pop-up window
    while True:

        if kinect.has_new_depth_frame() and kinect.has_new_color_frame() :
            elapsedtime = time.time()- starttime
            if(elapsedtime> i/10):

                #Only for high i try evalutaing FPS or else you get some divide by 0 errors
                if i >10:
                    try:
                        fps =  1/(elapsedtime - oldtime)
                        print(fps)
                        if fps> fpsmax:
                            fpsmax= fps
                        if fps < fpsmin:
                            fpsmin = fps

                    except ZeroDivisionError:
                        print("Divide by zero error")
                        pass

                oldtime = elapsedtime

                #read kinect colour and depth data (somehow the two formats below differ, think one is and one isnt ctypes)
                depthframe = kinect.get_last_depth_frame() #data for display
                depthframeD = kinect._depth_frame_data
                colourframe = kinect.get_last_color_frame()
                colourframeD = kinect._color_frame_data

                #convert depth frame from ctypes to an array so that I can save it
                depthframesaveformat = np.copy(np.ctypeslib.as_array(depthframeD, shape=(kinect._depth_frame_data_capacity.value,))) # TODO FIgure out how to solve intermittent up to 3cm differences
                pickle.dump(depthframesaveformat, depthfile)

                #reformat the other depth frame format for it to be displayed on screen
                depthframe = depthframe.astype(np.uint8)
                depthframe = np.reshape(depthframe, (424, 512))
                depthframe = cv2.cvtColor(depthframe, cv2.COLOR_GRAY2RGB)

                #Reslice to remove every 4th colour value, which is superfluous
                colourframe = np.reshape(colourframe, (2073600, 4))
                colourframe = colourframe[:,0:3] 

                #extract then combine the RBG data
                colourframeR = colourframe[:,0]
                colourframeR = np.reshape(colourframeR, (1080, 1920))
                colourframeG = colourframe[:,1]
                colourframeG = np.reshape(colourframeG, (1080, 1920))        
                colourframeB = colourframe[:,2]
                colourframeB = np.reshape(colourframeB, (1080, 1920))
                framefullcolour = cv2.merge([colourframeR, colourframeG, colourframeB])
                pickle.dump(framefullcolour, colourfile)

                if display_type == "COLOUR":

                    #Show colour frames as they are recorded
                    cv2.imshow('Recording KINECT Video Stream', framefullcolour)

                if display_type == "DEPTH":

                    #show depth frames as they are recorded
                    cv2.imshow('Recording KINECT Video Stream', depthframe)

                i = i+1

        #end recording if the escape key (key 27) is pressed
        key = cv2.waitKey(1)
        if key == 27: break
    cv2.destroyAllWindows()

if __name__ == "__main__":
    currentdate = datetime.datetime.now()
    custom_name = input("Enter a file name: ")
    file_name = custom_name + "." + str(currentdate.day) + "." + str(currentdate.month) + "."+ str(currentdate.hour) + "."+ str(currentdate.minute)

    #Save colour and depth frames
    save_frames(file_name)
import numpy as np
import pickle
import cv2
import ctypes
import os
parentDirectory = os.path.abspath(os.path.join(os.getcwd(), os.pardir))

from pykinect2 import PyKinectV2
from pykinect2.PyKinectV2 import *
from pykinect2 import PyKinectRuntime

#REMEBER TO HAVE A KINECT CONNECTED (OR KINECT STUDIO REPLAYING A RECORDING) WHEN RUNNING, EVEN THOUGH WE ARE OPERATING OFF SAVED DATA!
def get_3D_coordinates(filename, show_each_frame = False):
    """saves the 3D positions of a list of 2D pixel positions in the colour image. Correspodning depth data stored in DEPTH.filename.pickle"""

    #Define a list of 2D coordinates you want to locate
    colour_image_pixels_to_locate_list = [[880,555], [1440,200]]

    #Start a kinect (NEED TO CONNECT A KINECT or run a recording in kinect studio to make this command work,  even though we are reading saved depth values)
    kinect = PyKinectRuntime.PyKinectRuntime(PyKinectV2.FrameSourceTypes_Color | PyKinectV2.FrameSourceTypes_Depth)

    #Do a bunch of defines required for matching the colour coordinates to their depth later
    color2depth_points_type = _DepthSpacePoint* np.int(1920 * 1080)
    color2depth_points = ctypes.cast(color2depth_points_type(), ctypes.POINTER(_DepthSpacePoint))
    S = 1080*1920
    TYPE_CameraSpacePointArray = PyKinectV2._CameraSpacePoint * S
    csps1 = TYPE_CameraSpacePointArray()

    #load your saved depth data
    depthdatafile = open("DEPTH." + filename + ".pickle", "rb")

    #make list to store the 3D positions in
    pixel_positions_3D_list = []

    #Iterate over each saved frame of depth data
    depth_file_not_finished = True
    while depth_file_not_finished == True:
        try:
            depthframe = pickle.load(depthdatafile) #each call loads a sucessive frame from a pickle file, so we need to do this once per frame

            three_D_pixel_positions_in_frame =[] # list to store the 3D pixel positions from one frame

            #Defines to allow colour pixel mapping to 3D coords to work correctly     
            ctypes_depth_frame = np.ctypeslib.as_ctypes(depthframe.flatten())
            L = depthframe.size
            kinect._mapper.MapColorFrameToCameraSpace(L, ctypes_depth_frame, S, csps1)

            #Carry out certain actions if you want an image of where all the tracked points are in the depth data (makes program 20x slower)
            if show_each_frame == True:

                #Note the method on the line below, for finding the corrsponding depth pixel of a single tracked pixel in the colour image, is NOT what I am using to find the 3D position of a colour pixel
                kinect._mapper.MapColorFrameToDepthSpace(ctypes.c_uint(512 * 424), ctypes_depth_frame, ctypes.c_uint(1920 * 1080), color2depth_points)

                cut_down_depth_frame = depthframe.astype(np.uint8)
                cut_down_depth_frame = np.reshape(cut_down_depth_frame, (424, 512))

            #Iterate over the lists of pixel positions in the 2D colour image to locate
            for pixel in colour_image_pixels_to_locate_list:

                #find x and y in pixel position in the 2D colour image
                x = pixel[0]
                y = pixel[1]

                #Find 3D position of each pixel (relative to camera) using Colour_to_camera method, all measurements (x, y and z) in m
                x_3D = csps1[y*1920 + x].x
                y_3D = csps1[y*1920 + x].y
                z_3D = csps1[y*1920 + x].z
                pixel_position_3D = [x_3D, y_3D, z_3D]

                #if show_each_frame flag set,  display the depth data and corresponding points you are reading
                if show_each_frame == True:

                    try:

                        #method below finds 2D depth pixel that corresponds to a 2D colour pixel, for use in the pop up images, to show you what points you are tracking. While it could be used to find 3D joint positions, IT IS NOT THE METHOD I USE OR RECOMMEND FOR FINDING 3D JOINT POSITIONS, as it gives you x and y in pixels not m (z is in mm)
                        read_pos = x+y*1920 -1
                        depth_image_corresponding_x = int(color2depth_points[read_pos].x)
                        depth_image_corresponding_y = int(color2depth_points[read_pos].y)

                        #plot a circle at the pixel in the depth frame that matches the corresponding pixel in the image frame
                        cv2.circle(cut_down_depth_frame, (depth_image_corresponding_x,depth_image_corresponding_y), 5, (255, 0, 255), -1)

                        #note that the value below is NOT used in this code, included just for reference
                        corresponding_depth = depthframe[((depth_image_corresponding_y * 512) + depth_image_corresponding_x)]

                    except OverflowError:
                        #the SDK returns infinity for the depth of some positions, so we need to handle that
                        #I choose to not find the corresponding pixel in the depth image, and so dont plot a circle there, in this case
                        pass

                #Display annotated depth image if flag is set
                if show_each_frame == True:
                    cv2.imshow('KINECT Video Stream', cut_down_depth_frame)

                    #code to close window if escape is pressed, doesnt do anything in this program (as we keep calling for data to be displayed in the window) but included for reference
                    key = cv2.waitKey(1)
                    if key == 27: 
                        pass

                #add 3D positions found in this frame to an intermediate list
                three_D_pixel_positions_in_frame.append(pixel_position_3D)

            #add per frame lists of 3D position into a results list
            pixel_positions_3D_list.append(three_D_pixel_positions_in_frame)

       #close loop at end of file
        except EOFError:
            cv2.destroyAllWindows()
            depth_file_not_finished = False

    #return 3D joint position lists 
    return pixel_positions_3D_list

if __name__ == '__main__': 

    #replace name below with the corresponding section of the name of your saved depth data (for reference, the full name of my saved depth data file was DEPTH.test.1.29.13.17.pickle)
    three_dimensionsal_positions = get_3D_coordinates('test.29.1.13.23', show_each_frame =  True)
    print(three_dimensionsal_positions)

Code for stiching saved video frames into video, included for completeness

import cv2
import pickle

def frames_to_video(FILE_NAME):
    """Code to stitch a video based on frames saved in a pickle file"""

    print("stiching colour frames into video")

    #Load first colour frame, to get colour frame properties
    datafile = open("COLOUR." + FILE_NAME + ".pickle", "rb")
    frame = pickle.load(datafile)
    height, width, channels = frame.shape

    #define video properties
    out = cv2.VideoWriter(FILE_NAME + '.avi', cv2.VideoWriter_fourcc(*'DIVX'), 10, (int(width), int(height))) 

    #display first frame on a screen for progress (some duplication of later code as first frame needs to be loaded seperately to the rest so we can get the frame dimensions from it)
    out.write(frame)
    cv2.imshow('Stiching Video',frame)

    #Cycle through the rest of the colour frames, stiching them together
    while True:
        try:
            frame = pickle.load(datafile)
            out.write(frame)
            cv2.imshow('Stiching Video',frame)
            if (cv2.waitKey(1) & 0xFF) == ord('q'): # Hit `q` to exit
                break
        except EOFError:
            print("Video Stiching Finished")
            break

    # Release everything if job is finished
    out.release()
    cv2.destroyAllWindows()

if __name__ == '__main__': 

    #replace name below with the corresponding section of the name of your saved depth data (for reference, the full name of my saved colour data file was COLOUR.test.1.29.13.17.pickle)
    frames_to_video('test.29.1.13.23')
NIravMeghani commented 4 years ago

how to plot this 3d points in any scatter plot ?

Ddylen commented 4 years ago

how to plot this 3d points in any scatter plot ?

Im not sure I understand the question- pixel_positions_3D_list is a list of lists of 3D points. The number of 'sub list's corresponds to the number of colour image pixels specified in colour_image_pixels_to_locate_list. Each individual sublist gives the 3D poistion of said pixel in camera coordinates in each recorded frame.

As for plotting the 3D points in a sublist, you could do that with any common python plotting tool (e.g. https://matplotlib.org/3.1.1/gallery/mplot3d/scatter3d.html)

If the issue is that you end up plotting so many points that matplotlib is slow and laggy, I've run into that issue as well, never looked in to solving it beyond severe downsampling

NIravMeghani commented 4 years ago

https://drive.google.com/open?id=1x3xQvYzTp1XmJKhPq-2X2zmJ6dRhRXRL if you find any error in this then tell me or any other way if you found then tell me.

Ddylen commented 4 years ago

I'm somewhat timepressured right now so ive only skimmed through it, but what jumps out at me is that your seem to have defined your own function to find the 3D position from the 2D (x,y) and 1D depth data. The Kinect library has a function for that directly, which I've called in the code above (see the section with x_3D = csps1[y1920 + x].x y_3D = csps1[y1920 + x].y z_3D = csps1[y1920 + x].z) It requires a bunch of other things to be defined first, unfortunately i wrote this a while back so I cant recall which of the things I define at the start of my program were required for this specific function . I think it was : color2depth_points_type = _DepthSpacePoint np.int(1920 1080) color2depth_points = ctypes.cast(color2depth_points_type(), ctypes.POINTER(_DepthSpacePoint)) S = 10801920 TYPE_CameraSpacePointArray = PyKinectV2._CameraSpacePoint * S csps1 = TYPE_CameraSpacePointArray()

and

        ctypes_depth_frame = np.ctypeslib.as_ctypes(depthframe.flatten())
        L = depthframe.size
        kinect._mapper.MapColorFrameToCameraSpace(L, ctypes_depth_frame, S, csps1)

but im not 100% sure, youd be best playing around with my example program to figure it out

KonstantinosAng commented 4 years ago

Check this issue for Creating 3d scatter plots:

https://github.com/Kinect/PyKinect2/issues/82