HaveAGitGat / Tdarr

Tdarr - Distributed transcode automation using FFmpeg/HandBrake + Audio/Video library analytics + video health checking (Windows, macOS, Linux & Docker)
Other
2.89k stars 89 forks source link

FFMPEG throwing error "Provided device doesn't support required NVENC features" #985

Closed miversen33 closed 3 months ago

miversen33 commented 4 months ago

Describe the bug I'm not really sure what is going on but I have several media files that tdarr decides to use NVENC specific ffmpeg flags/features for. And that is fine as these files are being transcoded on an Nvidia GTX 1060 running driver version 545

miversen@gpu01:~$ nvidia-smi
Fri Apr 19 12:32:49 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    On  | 00000000:01:00.0 Off |                  N/A |
|  0%   44C    P8              11W / 200W |      1MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Going off this reddit post having a driver newer than 520 should be enough to use NVENC.

For whatever reason though, ffmpeg is not having it and throws up. Not really sure what I am missing

To Reproduce Queue up certain media files (I am not sure why some of these are being told to use NVENC) on tdarr remote node with GPU. Let transcode and it will fail at the end

Expected behavior Either don't use NVENC if the underlying hardware doesn't support it (it does but still), or properly transcode with NVENC support

Screenshots If applicable, add screenshots to help explain your problem.

Please provide the following information:

-Worker error [can be found on the 'Tdarr' tab by pressing the 'i' button on a failed item in the staged file section or in the transcode error section at the bottom]

2024-04-17T09:15:23.351Z [h264_nvenc @ 0x55ba810d53c0] 10 bit encode not supported
2024-04-17T09:15:23.351Z [h264_nvenc @ 0x55ba810d53c0] Provided device doesn't support required NVENC features
2024-04-17T09:15:23.351Z [vost#0:0/h264_nvenc @ 0x55ba810d50c0] Error initializing output stream: Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
2024-04-17T09:15:23.351Z Conversion failed!
2024-04-17T09:15:23.351Z
2024-04-17T09:15:23.379Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:Running FFmpeg failed
2024-04-17T09:15:23.393Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:[-error-]
2024-04-17T09:15:23.406Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:Error: FFmpeg failed
2024-04-17T09:15:23.417Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:"FFmpeg failed"
2024-04-17T09:15:23.432Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:"Error: FFmpeg failed\n at /app/Tdarr_Node/assets/app/plugins/FlowPlugins/CommunityFlowPlugins/ffmpegCommand/ffmpegCommandExecute/1.0.0/index.js:192:27\n at step (/app/Tdarr_Node/assets/app/plugins/FlowPlugins/CommunityFlowPlugins/ffmpegCommand/ffmpegCommandExecute/1.0.0/index.js:33:23)\n at Object.next (/app/Tdarr_Node/assets/app/plugins/FlowPlugins/CommunityFlowPlugins/ffmpegCommand/ffmpegCommandExecute/1.0.0/index.js:14:53)\n at fulfilled (/app/Tdarr_Node/assets/app/plugins/FlowPlugins/CommunityFlowPlugins/ffmpegCommand/ffmpegCommandExecute/1.0.0/index.js:5:58)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"
2024-04-17T09:15:23.446Z 1aoKkQ_E9:Node[gpu01-node]:Worker[bulky-boa]:Flow has failed

Additional context I have several nodes but I have seen this NVENC error before. I thought it was because the other nodes don't have an Nvidia GPU available to use so I simply removed those nodes as a test and had these jobs queued up on the Nvidia GPU node specifically.

I'm not really sure what is going on here

miversen33 commented 4 months ago

Some additional context, I have successfully transcoded these problem files with an AMD GPU, so it seems to be specifically an issue with Tdarr choosing the wrong flags due to the GTX 1060 being used. Potentially a bad driver in the container? I'm not really sure.

supersnellehenk commented 4 months ago

The 1060 doesn't support 10 bit colour depth, which is what the error is saying. You'll either have to force it to 8 bit, or not use the 1060 with 10 bit content.

miversen33 commented 4 months ago

Ahh well that makes sense! Thanks :)

I am assuming the 10 bit colour depth is coming from the input file itself as I am not specifying that in the flow and not every input file is failing on the 1060. Is there a check I can do in the flow to pre-emptively kick out stuff that the hardware can't handle? I notice that sometimes when a node gets a task, it will do a check and then drop the file into staging and move onto another. This seems like a great thing to add to that if I knew how to do it lol

HaveAGitGat commented 3 months ago

You could do something like this where you have 2 routes with different transcoding arguments depending on whether it's 10 bit or not:

image

miversen33 commented 3 months ago

@HaveAGitGat thanks for that :)

To my question, there is no way to pre-empt that? If the source is 10bit color depth and I want to keep it that way, have node X process it instead of the current node?

I have a single node that is capable of processing at 10bit it seems. So in a perfect world, those conversions would happen on there.

I guess my question here is, is there any way we can do flows that pre-empt the actual conversion process? IE, have node run a healthcheck-like flow to determine if it can even process the input file? If not, drop the input back in stage and try another file?

HaveAGitGat commented 3 months ago

@miversen33 that functionality is coming soon. Main issue I can see is that you end up with thousands of items in the staging section because the node required to do the work is slower than the other nodes. Of course, the items could be added back into the original queue, but in situations where you want one node to do some work (such as video encoding), then put the item back in the queue for another node to say to audio, same issue would apply but suppose that is up to the user.