dmlc / decord

An efficient video loader for deep learning with smart shuffling that's super easy to digest
Apache License 2.0
1.86k stars 160 forks source link

decode to OpenGL texture / FBO / PBO #205

Open Yves33 opened 2 years ago

Yves33 commented 2 years ago

I'm using decord (either cuda of not, depending on machine) to grab video frames that are later processed/displayed through openGL. In the present form, this implies (at least with cuda) GPU->CPU->memory allocation with asnumpy()->back to GPU.

I have little expertise in deep learning frameworks, but it seems that the decord.bridges enable to use data while still on GPU (without an intermediate CPU transfer) Would it be possible to implement (and how?) an OpenGL bridge, so we could use glBindTexture glBindBuffer (...) with a texture ID or FBO/PBO ID

Alternatively, how could we directly use _vr[x].handle in glTexImage2D glTexSubImage2D any directions?

Thanks

Yves33 commented 2 years ago

decode to opengl texture is possible with https://github.com/NVIDIA/VideoProcessingFramework. An example has recently been included in main repo.

Yves33 commented 2 years ago

Got some code working, here are the main steps. If anybody is interested, I can write a minimal working GLUT exemple. In my present tests, zero copy update of OpenGL texture is slower than regular update using asnumpy through PBOs, but CPU usage is lower! There may be some hidden data transfer somewhere in decord (by contrast, using VideoProcessingFramework, I get a huge speed increase using zero copy path)

import pycuda
import pycuda.autoinit

def setup():
    vr=VideoReader("some_file.mp4",ctx=gpu(0))
    width,height=vf[0].shape
    (...)
    tex=glGenTexture(1)
    glActiveTexture(GL_TEXTURE0)
    glBindTexture  ( GL_TEXTURE_2D, tex )
    glTexImage2D(GL_TEXTURE_2D, 0 , GL_RGB, width, height, 0,  GL_RGB, GL_UNSIGNED_BYTE, ctypes.c_void_p(0)) ##reserve space for texture

    ## one cannot update textures directly, one has to use a buffer object, then update the texture from this object
    pbo = glGenBuffers(1)
    glBindBuffer(GL_ARRAY_BUFFER, pbo)
    glBufferData(GL_ARRAY_BUFFER, np.zeros(width*sheight*3,np.uint8), GL_DYNAMIC_DRAW)
    glBindBuffer(GL_ARRAY_BUFFER, 0)
    import pycuda.gl.autoinit ## must be run after gl initialization! needs a valid context
    cuda_pbo = pycuda.gl.RegisteredBuffer(int(self.pbo))

def updatetexture():
    buffer_mapping = cuda_pbo.map()
    buffptr,buffsize=buffer_mapping.device_ptr_and_size()
    ## using vr.next() does not retrieve data from GPU. this is done when calling asnumpy()
    pycuda.driver.memcpy_dtod_async(buffptr,
                                    vr.next().handle.contents.data,
                                    buffsize)
    buffer_mapping.unmap()
    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo)
    glActiveTexture(GL_TEXTURE0)
    glBindTexture  ( GL_TEXTURE_2D, tex )
    glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0,
                     self.player_texture.width, 
                     self.player_texture.height,
                     GL_RGB, GL_UNSIGNED_BYTE, c_void_p(0))
    glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0)