NVIDIA / VideoProcessingFramework

Set of Python bindings to C++ libraries which provides full HW acceleration for video decoding, encoding and GPU-accelerated color space and pixel format conversions
Apache License 2.0
1.29k stars 231 forks source link

Unknow Error NvEncoder #172

Closed rizwanishaq closed 3 years ago

rizwanishaq commented 3 years ago

Describe the bug I am trying to encode videos in real time to h264 on same gpu with multiple streams.

import PyNvCodec as nvc
import numpy as np
import sys
import cv2
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def threaded_encoding(i):
    video_stream = cv2.VideoCapture('abc.mp4')
    encFile = open(f'test/{i}.mp4', 'wb')

    ret, frame = video_stream.read()

    H,W,c = frame.shape
    gpuID = 1

    nvUpl = nvc.PyFrameUploader(int(W), int(H), nvc.PixelFormat.RGB, gpuID)
    nvCvt = nvc.PySurfaceConverter(int(W), int(H), nvc.PixelFormat.RGB, nvc.PixelFormat.YUV420, gpuID)
    nvCvt1 = nvc.PySurfaceConverter(int(W), int(H), nvc.PixelFormat.YUV420, nvc.PixelFormat.NV12, gpuID)
    nvEnc = nvc.PyNvEncoder({'profile': 'baseline', 'fps': '25','preset': 'bd', 's': f"{W}x{H}", 'codec': 'h264'}, gpu_id=gpuID)

    encFrame = np.ndarray(shape=(0), dtype=np.uint8)

    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

    while True:
        rawSurface = nvUpl.UploadSingleFrame(frame)
        cvtSurface = nvCvt.Execute(rawSurface)
        cvtSurface = nvCvt1.Execute(cvtSurface)
        # success = nvEnc.EncodeSingleFrame(frame, encFrame, sync = True)
        success = nvEnc.EncodeSingleSurface(cvtSurface, encFrame)
        if success:
            encByteArray = encFrame.tobytes()
            print(i,encFrame.shape, success)
            encFile.write(encByteArray)

        ret, frame = video_stream.read()
        if not ret:
            break

        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

    success = nvEnc.Flush(encFrame)
    if(success):
        encByteArray = encFrame.tobytes()
        encFile.write(encByteArray)
    encFile.close()

    return i

if __name__=="__main__":
    looping = list(np.arange(0,10))
    with ThreadPoolExecutor() as ex:
        res = ex.map(threaded_decoding, looping)
    print('done')

suppose 5 streams on same gpu.. but when I run this.. sometime, I get the error NvEncoder : m_nvenc.nvEncOpenEncodeSessionEx(&encodeSessionExParams, &hEncoder) returned error 10 Description: EncodeAPI Internal Error. at /vpf/PyNvCodec/TC/src/NvEncoder.cpp:84

can we add multiple streams on same gpu?

Desktop (please complete the following information):

rarzumanyan commented 3 years ago

Hi @rizwanishaq

suppose 5 streams on same gpu.. but when I run this.. sometime, I get the error

What GPU do you run the sample on? Consumer grade GPUs have the limitation of 2 simultaneous encoding sessions.

rizwanishaq commented 3 years ago

@rarzumanyan I am running it on Titan RTX (24GB).

rarzumanyan commented 3 years ago

@rizwanishaq

According to Nvenc support matrix, Titan RTX supports 3 parallel nvenc sessions (number was recently increased from 2).

rizwanishaq commented 3 years ago

So this error is happening because of more thread, NvEncoder : m_nvenc.nvEncOpenEncodeSessionEx(&encodeSessionExParams, &hEncoder) returned error 10 Description: EncodeAPI Internal Error. at /vpf/PyNvCodec/TC/src/NvEncoder.cpp:84

rizwanishaq commented 3 years ago

@rarzumanyan I am still getting the same error RuntimeError: NvEncoder : m_nvenc.nvEncOpenEncodeSessionEx(&encodeSessionExParams, &hEncoder) returned error 10 Description: EncodeAPI Internal Error. at /tertiary/rizwan/Wav2Lip_VPF/vpf/PyNvCodec/TC/src/NvEncoder.cpp:84

As I have 6 GPU (Titan RTX), and trying to put 3 streams on each gpu, it seems that in any case it is not allowing 4th streams, and raising the above error? so this mean we have only 3 streams per machine???

Idea I should get (18 streams, but getting only 3) and when 4 stream started... it raises error.

rarzumanyan commented 3 years ago

@rizwanishaq

What parameters are calling the scripts with? Do you launch your scripts with 6 different gpu id's?

rizwanishaq commented 3 years ago
import PyNvCodec as nvc
import numpy as np
import sys
import cv2
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def threaded_encoding(i):

    video_stream = cv2.VideoCapture("abc.mp4")
    encFile = open(f"test/{i}.mp4", "wb")

    ret, frame = video_stream.read()

    H,W,c = frame.shape

    gpuID = i//3

    nvUpl = nvc.PyFrameUploader(int(W), int(H), nvc.PixelFormat.RGB, gpuID)
    nvCvt = nvc.PySurfaceConverter(int(W), int(H), nvc.PixelFormat.RGB, nvc.PixelFormat.YUV420, gpuID)
    nvCvt1 = nvc.PySurfaceConverter(int(W), int(H), nvc.PixelFormat.YUV420, nvc.PixelFormat.NV12, gpuID)
    nvEnc = nvc.PyNvEncoder({'profile': 'baseline', 'fps': '25','preset': 'bd', 's': f"{W}x{H}", 'codec': 'h264'}, gpu_id=gpuID)

    encFrame = np.ndarray(shape=(0), dtype=np.uint8)

    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    # frame = cv2.cvtColor(frame, cv2.COLOR_BGR2YUV_I420)

    while True:
        rawSurface = nvUpl.UploadSingleFrame(frame)
        cvtSurface = nvCvt.Execute(rawSurface)
        cvtSurface = nvCvt1.Execute(cvtSurface)
        # success = nvEnc.EncodeSingleFrame(frame, encFrame, sync = True)
        success = nvEnc.EncodeSingleSurface(cvtSurface, encFrame)
        if success:
            encByteArray = encFrame.tobytes()
            print(i,encFrame.shape, success)
            encFile.write(encByteArray)

        ret, frame = video_stream.read()
        if not ret:
            break
        # frame = cv2.cvtColor(frame, cv2.COLOR_BGR2YUV_I420)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

    success = nvEnc.Flush(encFrame)
    if(success):
        encByteArray = encFrame.tobytes()
        encFile.write(encByteArray)

    encFile.close()

    return i

if __name__=="__main__":
    number_of_streams = list(np.arange(0,18))
    with ThreadPoolExecutor() as ex:
        res = ex.map(threaded_encoding, number_of_streams)

    for r in res:
        print(r)
    print('done')

Yes, I am starting each thread with different GPU.

rarzumanyan commented 3 years ago

@rizwanishaq

Allow me some time, I'll ask our Video Codec SDK people if this 3 streams limit is system-wide. As I get a word back from them, I'll update you in this thread.

rarzumanyan commented 3 years ago

@rizwanishaq

3 sessions limitation is system-wide. This topic is described in details in Nvenc Application Note which is shipped as part of Video Codec SDK.