RTUnit / Jellyfin-Transcodes-cleanup

Cleanup script for autonomous cleanup of Jellyfin media server transcodes directory
GNU General Public License v3.0
59 stars 6 forks source link

Stuttering - only 3 ts files are kept after about 5 minutes #10

Open Pandiora opened 1 year ago

Pandiora commented 1 year ago

I can't find the reasoning behind this, but after about ~5 minutes only 3 ts files are kept all the time and it seems ffmpeg can't create new ones fast enough. I'm using a tmpfs ramdisk with 1G mounted to /config/transcodes which is recognized correctly. I also noticed an usual high cpu usage (~40% on 4 CPUs) while transcoding a 4K@120MBit file, but the ffmpeg log claims it is utilizing vaapi.

At startup ffmpeg creates between 20 and 30 ts files than goes into a paused state, when it is down to 3 ts files it starts to run again, but there is noticable stutter for the viewer. From what I have read the amount of space needed is calculated by amount of users. I believe one could entirely remove this calculation and give a fixed max. amount of RAM needed by the biggest file one could currently use and play and then tell the user to adjust the ramdisk accordingly by the expected amount of users. This way the only thing taken into account would be the ts fragments and thus much better controllable - I do not understand all the parameters yet. Surely it would be more future-proof to calculate the needed ram dynamically, but for me it seems this is causing unnecessary issues - but that might be a shot in the blue.

I still need to figure out why the cpu usage gets so high the longer I play a movie. It could be proxmox does not likes the interaction with the tmpfs-ramdisk, it would also be the ffmpeg turning off and on all the time after a while. It would be appreciated if Jellyfin could just integrate such a function into their system - what if you have mutliple users watching? You need an extra drive so ffmpeg can store the ts files?

Pandiora commented 1 year ago

Well, seems to occur also with ramfs, soooo the used kind of ramdisk isn't the issue here it seems.

Also with version 10.9 Jellyfin will include an option to really throttle the files which get created for transcoding by ffmpeg. I believe this could also fix my high cpu usage issue. Since practically it would not make sense to keep such a custimization as this one, your scripts will be obsolete. =( I'm closing this for now and wait for the new feature to come.

Pandiora commented 1 year ago

I might have found the reason for the stutter. I have loaded a backup of my jellyfin lxc container without all the adjustments I made, but enabled throttling for hardware acceleration, which prevents such high hdd usage by ffmpeg. Then I watched a 4K stream for over 5 minutes, only to encounter a complete stop of the playback.

Then I found this and this issue which might solve the problem for some users.

RTUnit commented 1 year ago

Regarding calculation of available space for each user... If you are sure of the max number of movies will be watched at same time, then try setting property TS_SPACE_RESERVED_MAX_DIVIDER inside cleanup script. For example, if set to 1 then it will use 90% space for User, and 10% reserve for waste.

The User might be misleading term that I have used. When you play something in Jellyfin, Ffmpeg wrapper will create a PID file using 32 character code that is appearing in Ffmpeg arguments (same appears in TS filename). The sript is treating every PID as a User. Even if the same Jellyfin user will watch two movies, there will be two PID files. There can be sometimes old TS files hanging around from recent playback, having PID file for those, so it would be good to keep some reserve from the total ramdisk space for such waste storage.

The size of TS files might not be easy to predict. Reencoded video in highest quality (bandwidth) setting might be bigger than the original file, I guess.

RTUnit commented 1 year ago

By default, cleanup script, after pausing Ffmpeg, is configured to resume Ffmpeg process when there are 2-3 TS files remaining. This is controlled by property SCHEDULE_RESUME_FFMPEG_TS_ID_COUNT. You might try to set it to a higher number which will result that Ffmpeg is resumed earlier allowing Ffmpeg more time to produce TS files.

Pandiora commented 1 year ago

I can retest later - still got a snapshot of my adjustments with your script. Maybe move all the parameters in one file and make it more clear as to how to set up your script. In example for FFMPEG_DIR you wrote this will be for transcode-cleanup.sh, which confused me. One would actually have to point this directory to the ffmpeg executable folder like written earlier in your readme. A concatenation or at least explanation of all the parameters would help clear up things.

Pandiora commented 1 year ago

That did not work, also I have tried this setting like 2 days ago only to encounter the same result. There will always be only 3 ts files left in the queue, which seems to low. I do not have much time to investigate further. The high cpu usage occurs again and for transcoding with VAAPI (AMD Ryzen Vega) there should be almost no cpu utilization. With htop I can see ffmpeg is utilizing the CPU heavily, but the logs claim VAAPI is used to transcode the content, thus using hardware acceleration (iGP).

RTUnit commented 1 year ago

You might try commenting this line in ffmpeg.wrap script and re-test. Let me know if it improves the speed

_args=${_args/-threads 0/-threads 1}

https://github.com/RTUnit/Jellyfin-Transcodes-cleanup/blob/d24421f1a6433d209c3193b51096a8c2425f2de4/ffmpeg.wrap#L143

The default argument is threads 0 which means that FFMPEG can spawn unlimited number of child processes to produce TS files more rapidly. I have changed it to threads 1 because my server creates these very quickly. In my tests, there was no difference if threads is 0 or 1, but it seemed more logical in my setup to avoid parallel threads. If it improves performance of your server then let me know and I will comment this line.

Pandiora commented 1 year ago

Well, that did not help either, but something seems to be different - for the blink of an eye I can see bufmon adding a 4th ts file, only to go back to the count of 3 - with a repeating pattern. BUT 3 ts files should be enough for x seconds of streaming - the stutter must come from something else. And the cpu utilization is still to high, especially because ffmpeg claims it is using vaapi. I have got a Ryzen 5600G, which should have enough power to transcode 4K files to 1080p / HDR 10.

So, this is the ffmpeg-logfile I have got from jellyfin:

Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_vaapi))
  Stream #0:1 -> #0:1 (eac3 (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
[h264_vaapi @ 0x55cbd83f2880] Driver does not support some wanted packed headers (wanted 0xd, found 0x1).
Output #0, hls, to '/config/transcodes/eeadcabd8b98b707fc571494a04baa4f.m3u8':
  Metadata:
    encoder         : Lavf59.27.100
  Stream #0:0: Video: h264 (High), vaapi(tv, bt709, progressive), 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 38145 kb/s, 23.98 fps, 90k tbn (default)
    Metadata:
      encoder         : Lavc59.37.100 h264_vaapi
    Side data:
      DOVI configuration record: version: 1.0, profile: 8, level: 6, rpu flag: 1, el flag: 0, bl flag: 1, compatibility id: 1
  Stream #0:1: Audio: aac, 48000 Hz, stereo, s16, 384 kb/s (default)
    Metadata:
      encoder         : Lavc59.37.100 libfdk_aac

Error "Driver does not support some wanted packed headers" occurs when there are baked in subtitles. I believe this one could be ignored?!

Then I have looked into the jellyfin log to find out if an error with my websockets is causing this issue - but no:

[2023-11-21 18:06:33.815 +01:00] [INF] Current HLS implementation doesn't support non-keyframe breaks but one is requested, ignoring that request
[2023-11-21 18:06:33.816 +01:00] [INF] "/usr/lib/jellyfin-ffmpeg/ffmpeg.wrap" "-analyzeduration 200M -ss 00:42:18.000 -init_hw_device vaapi=va:/dev/dri/renderD128 -filter_hw_device va -hwaccel vaapi -hwaccel_output_format vaapi -autorotate 0 -i file:\"/<redacted>.mkv\" -autoscale 0 -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 h264_vaapi -rc_mode VBR -b:v 38145182 -maxrate 38145182 -bufsize 76290364 -force_key_frames:0 \"expr:gte(t,2538+n_forced*3)\" -flags:v -global_header -vf \"setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_vaapi=format=nv12:extra_hw_frames=24\" -codec:a:0 libfdk_aac -ac 2 -ab 384000 -af \"volume=2\" -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type mpegts -start_number 846 -hls_segment_filename \"/config/transcodes/<redacted>.ts\" -hls_playlist_type vod -hls_list_size 0 -y \"/config/transcodes/<redacted>.m3u8\""
[2023-11-21 18:09:40.992 +01:00] [INF] Sending ForceKeepAlive message to 1 inactive WebSockets.
[2023-11-21 18:09:52.993 +01:00] [INF] Sending ForceKeepAlive message to 1 inactive WebSockets.
[2023-11-21 18:10:40.992 +01:00] [INF] Sending ForceKeepAlive message to 1 inactive WebSockets.
[2023-11-21 18:10:52.992 +01:00] [INF] Sending ForceKeepAlive message to 1 inactive WebSockets

I have debugged with webconsole and can see the websockets are still open or kept alive when the problem starts to occur and after.

The UI informations tell me the stream works fine:

Player informations
Player: Html Video Player
Play method: Transcoding
Protocol: https
Streamtype: HLS

But what might be interesting: The transcoding speed drops to 25 fps (which might be correct) and if I change the movie position to another timeframe it is like I'm playing the movie from scratch and error will occur after 5 minutes again. Update: Before the stutter begins the transcoding speed drops to 24 fps. I also checked the temps - which are alright, the current setup got way to much airflow, so there should not be an issue with overheating.

So I believe this stutter can only be caused by how ffmpeg works. Also I didn't had problems with websockets before - I'm using a reverse proxy and split dns. Also, these problems probably only occur on certain files - the one I'm trying to trigger this problem with is 4K HEVC Main 10.

PS: I didn't rebooted my machine, is there some way to tell the new config got loaded or should it be enough to start the streaming again? (like I did for most of my tests)

PPS: It would be easier to debug this behaviour if the stream would not initially load like 20-30 ts files and then go down to 3 ts files after time X. Is there any easy way to just go for X ts files all the time? Also you wrote somewhere jellyfin wil have problems if there are only 2 ts files left - maybe this is causing the stutter? And why does SCHEDULE_RESUME_FFMPEG_TS_ID_COUNT not work/apply?

RTUnit commented 11 months ago

If it is the case that in your computer/server FFMPEG is creating files fast in the beginning, and then it is slowing down, then we can try to restart the FFMPEG process instead of resuming it from sleep (from pause). Once there are many TS files in transcodes directory and the space is filled up, the FFMPEG will be paused as usual, but when playback will approach the last file that is available in the transcodes directory then the paused FFMPEG process will be killed by cleanup script. Jellyfin will notice that there are no more TS files to play, and that FFMPEG process is not running, and it will launch a new FFMPEG process to create next TS files. This should result that FFMPEG will create next TS files fast, at least that is the concept we can test here, specific to your situation.

I created a new branch with transcode.cleanup.sh file containing necessary changes: https://github.com/RTUnit/Jellyfin-Transcodes-cleanup/blob/issue-10-stuttering/transcode.cleanup.sh

You can download this one file and overwrite in your file system. Then either restart the computer where Jellyfin server is running, or create the transcode.cleanup.stop file in semaphore directory (as described in here). This is needed to terminate the currently running cleanup script, so that once you start playback in Jellyfin, the modified cleanup script will be launched. Ensure that no playback is running before you are creating the transcode.cleanup.stop, otherwise there will be no cleanup of the TS files until you start a new playback.

I have added global variable RESTART_FFMPEG_INSTEAD_OF_RESUME=1 to tell the cleanup script to restart FFMPEG instead of resuming it from the sleep.

The only aspect that cannot be controlled (or at least I do not know how to control it), is that Jellyfin will launch FFMPEG only when it has encountered that there are no more TS files to play back. Launching FFMPEG and creating the TS file takes a few seconds, so the playback might get frozen for these number of seconds. But at least we will know if it improves playback in your case. So even if I set SCHEDULE_RESTART_FFMPEG_TS_ID_COUNT=10 which means that FFMPEG is killed while there are still 10 TS files in transcodes directory, Jellyfin will launch FFMPEG only when the last file has been played back, and it sees no more files available.

Pandiora commented 11 months ago

Thank you, I have just tested the new script now - replace the previous one with this new one, rebooted the server but sadly it does not work out as expected. When there are no more TS files left the ffmpeg process indeed gets restarted but the playback pauses indefinitely.

I also tried to wipe out any erros on my side - found defect RAM and replaced it. The cpu usage is still to high. I don't know if I have mentioned it, but I'm using Jellyfin inside a LXC container. This still does not explain all the issues I have got.

I will do another test within the next days probably when I set up my machine from scratch and migrated all my stuff to a new host.

IzanagisBurden commented 10 months ago

Just wanted wanted to say I'm observing the same behavior - 5600G with Jellyfin/FFMPEG running natively (no containers/VM) on Ubuntu 23.04 with a 1 GB ramfs for transcodes and VAAPI HW transcoding (I am not observing high CPU usage though). Eventually only 3 ts files are kept and causes stutters during playback, but also it seems to pick up the pace after some time and then repeats.

RTUnit commented 10 months ago

My best guess is that HW transcoding takes too long time to produce TS files. The pitty is that Jellyfin is starting FFMPEG when there are only 3 TS files left, so the hick-up happens when new TS files are not yet produced fast enough for continuous playback. In theory it is possible to start FFMPEG from cleanup script itself using the same arguments that were last used by Jellyfin starting from missing TS file. I could try this out when time allows to work on it. If this works out then it will be possible to start FFMPEG for example when 10 TS files are remaining

IzanagisBurden commented 10 months ago

hello, I believe this is actually an issue with ffmpeg or something outside of your script. I posted this under a similar issue in the jellyfin repo:

I've been playing around with ffmpeg manually and it seems the issue lies with pausing/resuming ffmpeg. If I let a hw transcode go straight through (with hevc_vaapi + aac) it will move constantly at ~46 fps on my 5600G. However, if I pause and resume (using PB on my keyboard and fg to resume), most of the time it will resume and move slow (even the memory clock on radeontop does not move past 50%-60%) and sometimes it will resume at max speed (~100% memory clock). If I exit ffmpeg on a slow resume and restart the transcode, it will move at full speed. (Note - this is for a hw transcode although the same behaviour seems to be observed for direct streaming/video copy with pausing/resuming).

RTUnit commented 10 months ago

hello, I believe this is actually an issue with ffmpeg or something outside of your script. I posted this under a similar issue in the jellyfin repo:

Well...the whole idea of the script is to provide workaround for something that Jellyfin does not offer yet. Otherwise I would have closed this issue.

IzanagisBurden commented 10 months ago

Sorry I just meant I posted this observation with ffmpeg under a similar issue someone was having in the jellyfin repo. They are using the throttler which suspends/resumes ffmpeg which I believe is the root cause of both of the issues.

IzanagisBurden commented 10 months ago

Hello, from the other issue, I was made aware that there was an issue with the firmware for AMD iGPU that was causing the slow processing speed when ffmpeg resumed. I have updated to the lastest firmware manually (since my version of Ubuntu does not contain the latest firmware) and the issue has been resolved.

@Pandiora the issue along with how to upgrade the firmware is outlined here - https://gitlab.freedesktop.org/mesa/mesa/-/issues/8313