RandomEngy / VidCoder

A Blu-ray, DVD and video file transcoder for Windows.
http://vidcoder.net
GNU General Public License v2.0
686 stars 42 forks source link

NVENC limited to 3 simulateneous jobs #1131

Closed Milincho closed 1 year ago

Milincho commented 1 year ago

Problem Description

That limit has been patched out for some time: https://github.com/keylase/nvidia-patch

What version of VidCoder are you running?

9.3 Beta

Encode Log

No response

RandomEngy commented 1 year ago

Oh, so you can install a custom driver to remove this limit? Do you know why the limit is there in the default drivers? I think this would have to be a setting. How many is the practical maximum?

Milincho commented 1 year ago

Oh, so you can install a custom driver to remove this limit? Do you know why the limit is there in the default drivers? I think this would have to be a setting. How many is the practical maximum?

It's not a custom driver, but a patch for each driver version from Nvidia. It also automatically appears as an option using NVcleanstall (which a lot of people use, and it's very recommended): image

Nvidia artificially limiting 'consumer grade' products to push more expensive hardware to 'professionals'. Shocking. ¯_(ツ)_/¯

I don't know how many is the practical limit (maybe unlimited), but I guess at least 10-12 would be fine.

Here is the Windows page: https://github.com/keylase/nvidia-patch/tree/master/win

FFmpeg can be used to check if the limit has been removed: https://github.com/keylase/nvidia-patch/wiki/Verify-NVENC-patch

sr55 commented 1 year ago

The encode limit isn't very relevant for general purpose encoding.

It makes a difference with streaming as you are not using the full encode capacity of the encode engine as you are typically limited to 1080p60/4k60 so there is headroom to run multi-streams on the same encode engine.

Most consumer NVidia cards only have 1, or 2 encode engines. Each engine can support up to 5 streams as of hte latest driver.

However, if 1 stream maxes out an encode engine, running 2 has no benefit. the 2 streams will simply run at half the speed. (for a 1 encode engine card)

https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new "Total # of NVENC" column is what is relevant for GP encoding.

Milincho commented 1 year ago

My 4080 GPU is at 33% use encoding the limited 3 videos...

sr55 commented 1 year ago

Careful how you read it. Some tools will show real GPU usage which should be very low when running NVEnc.

Example: image

Notice, the engine engine is monitored separately. Some tools combine this into a GPU usage which is mis-leading. Some don't. This card is maxed out on encode.

Failing that, it's simply a nasty bottleneck that you have.

stevespaw commented 1 year ago

This is Windows 11 - more details now.

image

Milincho commented 1 year ago

Windows 11 Task Manager: image image

HWInfo64: image

It seems the GPU is not being used at all to decode the H264 input files, is this normal?

'Video Engine Load' is 50-75% which I assume is the sum of 'Video Encode 0' and 'Video Encode 1'. CPU usage is not maxed either.

Milincho commented 1 year ago

Log file, in case it's useful:

VC [17:02:14] VidCoder 9.3 Beta VC [17:02:14] Starting job 1/62 VC [17:02:14] Source path: X:\AI\mvsd00254.mp4 VC [17:02:14] Destination path: X:\AI\Mvsd00254.mkv VC [17:02:14] Title: 1 VC [17:02:14] Range: All VC [17:02:14] Preset: Bombadil AI Remaster VC [17:02:14] Worker ready: Pipe 'VidCoderWorker.92f83c78-d2fb-4559-a2c5-72129787619e' is open VC [17:02:14] Connecting to process 8404 on pipe VidCoderWorker.92f83c78-d2fb-4559-a2c5-72129787619e HB [17:02:15] hb_init: starting libhb thread [17:02:15] CPU: AMD Ryzen 9 3900X 12-Core Processor [17:02:15] - logical processor count: 24 [17:02:15] Intel Quick Sync Video support: no [17:02:15] hb_scan: path=X:\AI\mvsd00254.mp4, title_index=1 HB udfread ERROR: ECMA 167 Volume Recognition failed src/libbluray/disc/disc.c:333: failed opening UDF image X:\AI\mvsd00254.mp4 src/libbluray/disc/disc.c:437: error opening file BDMV\index.bdmv src/libbluray/disc/disc.c:437: error opening file BDMV\BACKUP\index.bdmv [17:02:15] bd: not a bd - trying as a stream/file instead libdvdread: DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.IFO failed libdvdnav: vm: vm: failed to read VIDEO_TS.IFO [17:02:15] dvd: not a dvd - trying as a stream/file instead HB Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'X:\AI\mvsd00254.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 Duration: 02:29:30.40, start: 0.000000, bitrate: 6005 kb/s Stream #0:00x1: Video: h264 (Main) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5738 kb/s, 29.97 fps, 29.97 tbr, 90k tbn (default) Metadata: vendor_id : [0][0][0][0] Stream #0:10x2: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 256 kb/s (default) Metadata: vendor_id : [0][0][0][0] [17:02:16] scan: decoding previews for title 1 [17:02:16] scan: audio 0x1: aac, rate=44100Hz, bitrate=256000 ??? (AAC LC) (2.0 ch) (256 kbps) HB [17:02:17] scan: 10 previews, 1920x1080, 29.970 fps, autocrop = 0/0/0/0, aspect 16:9, PAR 1:1, color profile: 1-1-1, chroma location: left [17:02:17] scan: supported video decoders: avcodec nvdec [17:02:17] libhb: scan thread found 1 valid title(s) HB [17:02:17] Starting work at: Fri Apr 07 17:02:17 2023 [17:02:17] 1 job(s) to process [17:02:17] json job: { "Audio": { "AudioList": [ { "DRC": 0, "Encoder": "copy:aac", "Gain": 0, "Mixdown": 0, "NormalizeMixLevel": false, "Samplerate": 0, "Track": 0, "DitherMethod": 0 } ], "CopyMask": [ "copy:aac", "copy:ac3", "copy:eac3", "copy:truehd", "copy:dts", "copy:dtshd", "copy:mp2", "copy:mp3", "copy:flac", "copy:opus" ] }, "Destination": { "ChapterList": [ { "Name": "Chapter 1" } ], "ChapterMarkers": true, "AlignAVStart": false, "File": "X:\AI\Mvsd00254.part.mkv", "Mp4Options": { "IpodAtom": false, "Mp4Optimize": true }, "Mux": "av_mkv" }, "Filters": { "FilterList": [ { "ID": 7, "Settings": { "mode": "1" } }, { "ID": 14, "Settings": { "crop-bottom": "0", "crop-left": "0", "crop-right": "0", "crop-top": "0", "height": "1080", "width": "1920" } } ] }, "PAR": { "Num": 1, "Den": 1 }, "Metadata": {}, "SequenceID": 0, "Source": { "Angle": 0, "Range": { "Type": "chapter", "Start": 1, "End": 1 }, "Title": 1, "Path": "X:\AI\mvsd00254.mp4" }, "Subtitle": { "Search": { "Burn": false, "Default": false, "Enable": false, "Forced": false }, "SubtitleList": [] }, "Video": { "Encoder": "nvenc_av1_10bit", "Level": "auto", "Bitrate": 4000, "TwoPass": false, "Turbo": false, "ColorMatrixCode": 0, "Options": "rc-lookahead=60", "Preset": "slowest", "Profile": "auto", "Tune": "fastdecode", "QSV": { "Decode": false }, "HardwareDecode": 0 } } [17:02:17] Starting Task: Encoding Pass [17:02:17] Skipping crop/scale filter [17:02:17] work: only 1 chapter, disabling chapter markers [17:02:17] job configuration: [17:02:17] source [17:02:17] + X:\AI\mvsd00254.mp4 [17:02:17] + title 1, chapter(s) 1 to 1 [17:02:17] + container: mov,mp4,m4a,3gp,3g2,mj2 [17:02:17] + data rate: 6005 kbps [17:02:17] destination [17:02:17] + X:\AI\Mvsd00254.part.mkv [17:02:17] + container: Matroska (libavformat) [17:02:17] video track [17:02:17] + decoder: h264 8-bit (yuv420p) [17:02:17] + bitrate 5738 kbps [17:02:17] + filters [17:02:17] + Framerate Shaper (mode=1) [17:02:17] + frame rate: 29.970 fps -> constant 29.970 fps [17:02:17] + Format (format=p010le) [17:02:17] + Output geometry [17:02:17] + storage dimensions: 1920 x 1080 [17:02:17] + pixel aspect ratio: 1 : 1 [17:02:17] + display dimensions: 1920 x 1080 [17:02:17] + encoder: AV1 10-bit (NVEnc) [17:02:17] + preset: slowest [17:02:17] + options: rc-lookahead=60 [17:02:17] + profile: auto [17:02:17] + level: auto [17:02:17] + bitrate: 4000 kbps, pass: 0 [17:02:17] + color profile: 1-1-1 [17:02:17] + chroma location: left [17:02:17] audio track 1 [17:02:17] + decoder: ??? (AAC LC) (2.0 ch) (256 kbps) (track 1, id 0x1) [17:02:17] + bitrate: 256 kbps, samplerate: 44100 Hz [17:02:17] + AAC Passthru [17:02:17] sync: expecting 268843 video frames [17:02:17] encavcodecInit: AV1 (Nvidia NVENC) [17:02:17] encavcodec: encoding at rc=vbr, Bitrate 4000 [17:02:17] encavcodec: encoding with stored aspect 1/1 HB [17:02:17] sync: first pts audio 0x1 is 0 [17:02:17] sync: first pts video is 2970 [17:02:17] sync: "Chapter 1" (1) at frame 1 time 2970 HB [17:42:45] reader: done. 1 scr changes HB [17:42:46] work: average encoding speed for job is 110.689262 fps HB [17:42:47] vfr: 268842 frames output, 0 dropped and 0 duped for CFR/PFR [17:42:47] vfr: lost time: 0 (0 frames) [17:42:47] vfr: gained time: 0 (0 frames) (0 not accounted for) [17:42:47] aac-decoder done: 386323 frames, 0 decoder errors [17:42:47] h264-decoder done: 268842 frames, 0 decoder errors [17:42:47] sync: got 268842 frames, 268843 expected [17:42:47] sync: framerate min 29.970 fps, max 10000.000 fps, avg 29.970 fps HB [17:42:47] mux: track 0, 268842 frames, 4628919859 bytes, 4128.16 kbps, fifo 2048 [17:42:47] mux: track 1, 386323 frames, 287052882 bytes, 256.00 kbps, fifo 2048 [17:42:47] Finished work at: Fri Apr 07 17:42:47 2023 [17:42:47] libhb: work result = 0 VC [17:42:47] Job completed (Elapsed Time: 40m 32s)

Milincho commented 1 year ago

Mmmm... image

Why is that? how can I enable Hardware decoding?

RandomEngy commented 1 year ago

9.4 Beta enabled hardware decoding as I mentioned in the other thread. I also made a special version of the app that uncaps the simultaneous encodes (only the configurable global option applies): https://engy.us/misc/VidCoder-9.5-Beta-Portable-NVEncUncapped.exe

Let me know if the GPU Video encode usage and the total encoding FPS goes up with it.

Milincho commented 1 year ago

Ok. Now I have up to 8 simultaneous encodes. The problem is only the CPU usage goes up (from 36% with 1 job to 99% with 8 jobs), but GPU usage and overall FPS always stay the same (330fps with either 1 or 8 jobs)):

1 job: 36% CPU usage - 330fps (33% Video Encode according to Windows 11 Task Manager) image

HWinfo64: GPU Video Decode 0% - GPU Video Engine Load 60% image

Windows 11 Task Manager: GPU Video Decode 0% - GPU Video Encode 33% image

8 jobs: 98% CPU usage - 330fps (33% Video Encode according to Windows 11 Task Manager) image

Windows 11 Task Manager: GPU Video Decode 0% - GPU Video Encode 33% image

The log says "HardwareDecode": 4 now, but both Windows Task Manager and HWinfo64 still say that GPU decoding is 0%... 🤔

VC [11:06:35] VidCoder 9.5 Beta VC [11:06:35] Starting job 1/19 VC [11:06:35] Source path: X:\AI\real00623.mp4 VC [11:06:35] Destination path: X:\AI\Real00623.mkv VC [11:06:35] Title: 1 VC [11:06:35] Range: All VC [11:06:35] Preset: Bombadil AI Remaster VC [11:06:35] Worker ready: Pipe 'VidCoderWorker.755a06b3-4ff0-411a-b443-ed5ae5fe4957' is open VC [11:06:35] Connecting to process 30776 on pipe VidCoderWorker.755a06b3-4ff0-411a-b443-ed5ae5fe4957 HB [11:06:35] CPU: AMD Ryzen 9 3900X 12-Core Processor [11:06:35] - logical processor count: 24 [11:06:35] Intel Quick Sync Video support: no [11:06:35] hb_scan: path=X:\AI\real00623.mp4, title_index=1 udfread ERROR: ECMA 167 Volume Recognition failed src/libbluray/disc/disc.c:333: failed opening UDF image X:\AI\real00623.mp4 src/libbluray/disc/disc.c:437: error opening file BDMV\index.bdmv src/libbluray/disc/disc.c:437: error opening file BDMV\BACKUP\index.bdmv [11:06:35] bd: not a bd - trying as a stream/file instead libdvdread: DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.IFO failed libdvdnav: vm: vm: failed to read VIDEO_TS.IFO [11:06:35] dvd: not a dvd - trying as a stream/file instead HB Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'X:\AI\real00623.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 Duration: 02:14:00.28, start: 0.000000, bitrate: 5986 kb/s Stream #0:00x1: Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709/unknown/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5720 kb/s, 29.97 fps, 29.97 tbr, 90k tbn (default) Metadata: vendor_id : [0][0][0][0] Stream #0:10x2: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 255 kb/s (default) Metadata: vendor_id : [0][0][0][0] [11:06:35] scan: decoding previews for title 1 [11:06:35] scan: audio 0x1: aac, rate=44100Hz, bitrate=255999 ??? (AAC LC) (2.0 ch) (255 kbps) HB [11:06:35] scan: 10 previews, 1920x1080, 29.970 fps, autocrop = 0/0/0/0, aspect 16:9, PAR 1:1, color profile: 1-1-1, chroma location: left [11:06:35] scan: supported video decoders: avcodec nvdec [11:06:35] libhb: scan thread found 1 valid title(s) HB [11:06:36] Starting work at: Sat Apr 08 11:06:36 2023 [11:06:36] 1 job(s) to process [11:06:36] json job: { "Audio": { "AudioList": [ { "DRC": 0, "Encoder": "copy:aac", "Gain": 0, "Mixdown": 0, "NormalizeMixLevel": false, "Samplerate": 0, "Track": 0, "DitherMethod": 0 } ], "CopyMask": [ "copy:aac", "copy:ac3", "copy:eac3", "copy:truehd", "copy:dts", "copy:dtshd", "copy:mp2", "copy:mp3", "copy:flac", "copy:opus" ] }, "Destination": { "ChapterList": [ { "Name": "Chapter 1" } ], "ChapterMarkers": true, "AlignAVStart": false, "File": "X:\AI\Real00623.part.mkv", "Mp4Options": { "IpodAtom": false, "Mp4Optimize": true }, "Mux": "av_mkv" }, "Filters": { "FilterList": [ { "ID": 14, "Settings": { "crop-bottom": "0", "crop-left": "0", "crop-right": "0", "crop-top": "0", "height": "1080", "width": "1920" } } ] }, "PAR": { "Num": 1, "Den": 1 }, "Metadata": {}, "SequenceID": 0, "Source": { "Angle": 0, "Range": { "Type": "chapter", "Start": 1, "End": 1 }, "Title": 1, "Path": "X:\AI\real00623.mp4" }, "Subtitle": { "Search": { "Burn": false, "Default": false, "Enable": false, "Forced": false }, "SubtitleList": [] }, "Video": { "Encoder": "nvenc_av1_10bit", "Level": "auto", "Bitrate": 4500, "TwoPass": false, "Turbo": false, "ColorMatrixCode": 0, "Options": "rc-lookahead=60", "Preset": "slowest", "Profile": "auto", "Tune": "fastdecode", "QSV": { "Decode": false }, "HardwareDecode": 4 } } [11:06:36] Starting Task: Encoding Pass [11:06:36] Skipping crop/scale filter [11:06:36] work: only 1 chapter, disabling chapter markers [11:06:36] job configuration: [11:06:36] source [11:06:36] + X:\AI\real00623.mp4 [11:06:36] + title 1, chapter(s) 1 to 1 [11:06:36] + container: mov,mp4,m4a,3gp,3g2,mj2 [11:06:36] + data rate: 5986 kbps [11:06:36] destination [11:06:36] + X:\AI\Real00623.part.mkv [11:06:36] + container: Matroska (libavformat) [11:06:36] video track [11:06:36] + decoder: h264 8-bit (yuv420p) [11:06:36] + bitrate 5720 kbps [11:06:36] + filters [11:06:36] + Format (format=p010le) [11:06:36] + Output geometry [11:06:36] + storage dimensions: 1920 x 1080 [11:06:36] + pixel aspect ratio: 1 : 1 [11:06:36] + display dimensions: 1920 x 1080 [11:06:36] + encoder: AV1 10-bit (NVEnc) [11:06:36] + preset: slowest [11:06:36] + options: rc-lookahead=60 [11:06:36] + profile: auto [11:06:36] + level: auto [11:06:36] + bitrate: 4500 kbps, pass: 0 [11:06:36] + color profile: 1-1-1 [11:06:36] + chroma location: left [11:06:36] audio track 1 [11:06:36] + decoder: ??? (AAC LC) (2.0 ch) (255 kbps) (track 1, id 0x1) [11:06:36] + bitrate: 255 kbps, samplerate: 44100 Hz [11:06:36] + AAC Passthru [11:06:36] sync: expecting 240967 video frames [11:06:36] encavcodecInit: AV1 (Nvidia NVENC) [11:06:36] encavcodec: encoding at rc=vbr, Bitrate 4500 [11:06:36] encavcodec: encoding with stored aspect 1/1 HB [11:06:36] sync: first pts audio 0x1 is 0 [11:06:36] sync: first pts video is 6030 [11:06:36] sync: "Chapter 1" (1) at frame 1 time 6030 HB [11:07:31] work: average encoding speed for job is 324.655243 fps HB [11:07:31] aac-decoder done: 25585 frames, 0 decoder errors [11:07:31] h264-decoder done: 17782 frames, 0 decoder errors [11:07:31] sync: got 17747 frames, 240967 expected [11:07:31] sync: framerate min 29.970 fps, max 29.970 fps, avg 29.970 fps HB [11:07:31] mux: track 0, 17669 frames, 346494796 bytes, 4680.58 kbps, fifo 1024 [11:07:31] mux: track 1, 25504 frames, 18950455 bytes, 255.99 kbps, fifo 1024 [11:07:31] Finished work at: Sat Apr 08 11:07:31 2023 [11:07:31] libhb: work result = 1 VC [11:07:31] Encoding stopped

Milincho commented 1 year ago

Both Windows Task Manager and HWinfo show a 3% GPU usage when the video is played in a video player (PotPlayer):

image

RandomEngy commented 1 year ago

The source also needs to support hardware decoding: https://github.com/HandBrake/HandBrake/blob/169996c0fd4f1fa57d6f0e6e0ee7bac1e9177fd3/libhb/nvenc_common.c#L324 . I see the hardware decoder working for me:

image

It looks like it only supports a couple of pixel formats:

https://github.com/HandBrake/HandBrake/blob/007c0166c54d79d43ec4008ed8f4da1efdbd1d39/libhb/common.c#L6276

Milincho commented 1 year ago

The source also needs to support hardware decoding: It looks like it only supports a couple of pixel formats:

I don't get it, these are H264 as standard as they come:

image

and video players do use the GPU Hardware decoding with them... so what's wrong for VidCoder/Handbrake?

RandomEngy commented 1 year ago

Here's a list of all the pixel formats according to ffmpeg:

https://ffmpeg.org/doxygen/trunk/pixfmt_8h.html#a9a8e335cf3be472042bc9f0cf80cd4c5

HandBrake is only set up to accept AV_PIX_FMT_YUV420P10LE and AV_PIX_FMT_NV12 for NVDec.

If you think that additional pixel formats would work with NVDec you could suggest it to HandBrake. If they add it, the change will arrive downstream to VidCoder.

sr55 commented 1 year ago

It's not that the source needs to support hardware decoding. There is nothing wrong with his source and the decoder would support it.

The problem is your asking for a conversion to occur by selecting the 10bit encoder with an 8bit source.

Milincho commented 1 year ago

It's not that the source needs to support hardware decoding. There is nothing wrong with his source and the decoder would support it.

The problem is your asking for a conversion to occur by selecting the 10bit encoder with an 8bit source.

My man! now we are talking.

image

It went from 330fps to 530fps+ image

Milincho commented 1 year ago

940fps with 8 simultaneous jobs... 🤯

image

and I think there is still room for some more sessions than the current 8 limit:

image

RandomEngy commented 1 year ago

Does 8 simultaneous job have better total FPS than 3 for you, then?

Milincho commented 1 year ago

Does 8 simultaneous job have better total FPS than 3 for you, then?

Yes, overall fps keep ramping up when adding more simultaneous jobs. What about doubling the limit to 16? I'll tell you at which point the FPS stops increasing.

sr55 commented 1 year ago

It actually looks like your slowing it down by running more.

In the case where hardware decode is used: 1 encode can fully saturate an NVENC unit. Therefore, 2 will max out an ADA card which has 2 units. There can be a few % points variance and if you run 4 instead of 2 encodes that will get you an additional extra few % at best.

Note, you need to be careful reading the aggerate FPS. It doesn't account for time based oscillations, changes in sources/complexity/settings so you can get swings that cause it to be inaccurate and essentially meaningless in many cases.

This kind of thing is really going to throw off the numbers. image

That's likely a low complexity scene at the start of one of the files causing that, or simply the workload ramping up and the average taking a while to catch up. Either way, you cannot get an accurate reading early on.

Notice, the speed essentially halves: Also note, I let the encodes almost finish to get this. image

Milincho commented 1 year ago

I'm getting consistently the same fps speed during the whole encoding time, with a 10-20fps variance at best.

With 8 simultaneous jobs total "current FPS" keeps ~940 during the whole encoding process.

1 job keeps ~530 "current FPS" consistenly during the whole encoding process.

And the rest keep values in between those two, but 8 always shows the higher "current FPS".

Doing less than 8 simultaneous jobs lowers the total "current FPS". And according to the Windows Task Manager there is still some free GPU Video Encode and Decode % to be used.

It would be easy to test how much real time a 12 or 16 queue takes to fully complete with 2,4,8 or 16 simultaneous jobs.

sr55 commented 1 year ago

What is the value with 2 encodes running?

Milincho commented 1 year ago

I did a test run with the same videos for more than 3 minutes. All FPS values are higher than before beause some of the videos are 720p instead of 1080p.

2 jobs: image

4 jobs: image

8 jobs: image

RandomEngy commented 1 year ago

I added an option in 9.5 Beta (Global options -> Process) to uncap the number of simultaneous NVEnc encodes. I also bumped up the total max simultaneous encodes to 16.