MrBrax / LiveStreamDVR

An automatic livestream recorder
MIT License
324 stars 25 forks source link

JS Heap OOM Exception on video download/remux #444

Closed 0x1f415 closed 7 months ago

0x1f415 commented 1 year ago

service was down for a stream so I missed a stream and I attempted to download using the video download function, but the server consistently crashes every single time when the remux process starts:

2023-05-29 12:48:44.934 | vod.downloadVideo <INFO> Download VOD 1830224547
2023-05-29 12:48:45.052 | vod.downloadVideo <INFO> Starting remux of S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
2023-05-29 12:48:45.055 | helper <INFO> Remuxing /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
startJob process /usr/bin/ffmpeg [
  '/usr/bin/ffmpeg',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ts',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt',
  '-map_metadata',
  '1',
  '-c',
  'copy',
  '-bsf:a',
  'aac_adtstoasc',
  '-movflags',
  'faststart',
  '/usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4'
]
2023-05-29 12:48:45.071 | helper <INFO> Executing /usr/bin/ffmpeg -i /usr/local/share/twitchautomator/data/cache/1830224547.ts -i /usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt -map_metadata 1 -c copy -bsf:a aac_adtstoasc -movflags faststart /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
2023-05-29 12:48:45.072 | helper <SUCCESS> Spawned process 51 for remux_1830224547.ts
2023-05-29 12:48:45.074 | job <INFO> Save job remux_1830224547.ts with PID 51 to /usr/local/share/twitchautomator/data/cache/pids/remux_1830224547.ts.json
Broadcasting data to 1 clients: {"action":"job_save","data":{"job_name":"remux_1830224547.ts","j...
2023-05-29 12:48:45.075 | helper <INFO> Attached to all streams for process 51 for remux_1830224547.ts
Remux total duration for 1830224547.ts: 12822
[2023-05-29T16:48:46.572Z] Remuxing 1830224547.ts - 343/12822 seconds (3%)
Broadcasting data to 1 clients: {"action":"job_update","data":{"job_name":"remux_1830224547.ts",...
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
[2023-05-29T16:48:52.351Z] Remuxing 1830224547.ts - 700/12822 seconds (5%)

<--- Last few GCs --->

[33:0x5614180]   125111 ms: Mark-sweep 8069.6 (8227.0) -> 8061.8 (8235.7) MB, 2041.6 / 0.0 ms  (average mu = 0.729, current mu = 0.203) allocation failure; scavenge might not succeed
[33:0x5614180]   128653 ms: Mark-sweep 8077.7 (8235.7) -> 8068.7 (8242.2) MB, 3512.5 / 0.0 ms  (average mu = 0.460, current mu = 0.008) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/usr/local/bin/node]
 2: 0xa8e823  [/usr/local/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xf3a3e5  [/usr/local/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0x16fb6b9  [/usr/local/bin/node]

also I tried raising the node heap size with NODE_OPTIONS=--max-old-space-size=8192 and that didn't help.

0x1f415 commented 1 year ago

log capture from a different attempt, after manually removing a bunch of cached log files, mediainfo jsons, mp4s of failed transcodes - managed to get all the way through the transcode but crashed after mediainfo

2023-05-29 12:59:23.785 | vod.downloadVideo <INFO> Downloaded VOD 1830224547...}
2023-05-29 12:59:23.785 | vod.downloadVideo <INFO> Starting remux of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:23.787 | helper <INFO> Remuxing /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
startJob process /usr/bin/ffmpeg [
  '/usr/bin/ffmpeg',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ts',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt',
  '-map_metadata',
  '1',
  '-c',
  'copy',
  '-bsf:a',
  'aac_adtstoasc',
  '-movflags',
  'faststart',
  '/usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4'
]
2023-05-29 12:59:23.795 | helper <INFO> Executing /usr/bin/ffmpeg -i /usr/local/share/twitchautomator/data/cache/1830224547.ts -i /usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt -map_metadata 1 -c copy -bsf:a aac_adtstoasc -movflags faststart /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:23.795 | helper <SUCCESS> Spawned process 68 for remux_1830224547.ts
2023-05-29 12:59:23.796 | job <INFO> Save job remux_1830224547.ts with PID 68 to /usr/local/share/twitchautomator/data/cache/pids/remux_1830224547.ts.json
Broadcasting data to 1 clients: {"action":"job_save","data":{"job_name":"remux_1830224547.ts","j...
2023-05-29 12:59:23.797 | helper <INFO> Attached to all streams for process 68 for remux_1830224547.ts
Remux total duration for 1830224547.ts: 12822
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"download_vod_183022454...
Broadcasting data to 1 clients: {"action":"job_update","data":{"job_name":"remux_1830224547.ts",...
[2023-05-29T16:59:27.015Z] Remuxing 1830224547.ts - 385/12822 seconds (3%)
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"download_vod_183022454...
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
2023-05-29 12:59:41.797 | job <INFO> Job remux_1830224547.ts progress: 15%
[2023-05-29T16:59:41.798Z] Remuxing 1830224547.ts - 1876/12822 seconds (15%)
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
2023-05-29 12:59:50.573 | job <INFO> Job remux_1830224547.ts progress: 98%
[2023-05-29T16:59:50.574Z] Remuxing 1830224547.ts - 12542/12822 seconds (98%)
Create MOOV atom for 1830224547.ts
2023-05-29 12:59:53.289 | helper.mediainfo <INFO> Run mediainfo on /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:53.290 | helper.videometadata <ERROR> Trying to get mediainfo of /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 returned: File not found for mediainfo
2023-05-29 12:59:53.290 | channel <ERROR> Trying to get mediainfo of /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 returned: File not found for mediainfo
::ffff:192.168.1.80 - - [29/May/2023:17:00:11 +0000] "GET /api/v0/channels/080d7c0f-1e34-48fc-8a17-5fa86b8a1dd8/download/1830224547 HTTP/1.1" - - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0"
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
[2023-05-29T17:00:11.148Z] Remuxing 1830224547.ts - 12822/12822 seconds (100%)
2023-05-29 13:00:39.908 | helper <SUCCESS> Process 68 for remux_1830224547.ts closed with code 0
2023-05-29 13:00:39.909 | job <INFO> Clear job remux_1830224547.ts with PID 68
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"remux_1830224547.ts","...
2023-05-29 13:00:39.910 | job <SUCCESS> Job remux_1830224547.ts removed from jobs list
2023-05-29 13:00:39.910 | job <WARNING> Job remux_1830224547.ts not found in jobs list
2023-05-29 13:00:40.253 | helper <SUCCESS> Remuxed /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 13:00:40.573 | vod.downloadVideo <INFO> Successfully remuxed S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4, removing /usr/local/share/twitchautomator/data/cache/1830224547.ts
2023-05-29 13:00:41.698 | vod.downloadVideo <INFO> Download of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 successful
Broadcasting data to 1 clients: {"action":"video_download","data":{"success":true,"path":"/usr/l...
2023-05-29 13:00:41.699 | channel <INFO> Create VOD JSON for paymoneywubby: S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].json @ /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305
2023-05-29 13:00:41.700 | vod.saveJSON <SUCCESS> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]  (create json)
2023-05-29 13:00:41.705 | vod <ERROR> No chapters on S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]!
2023-05-29 13:00:41.710 | vod.load <WARNING> Loaded VOD S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] is not finalized!
2023-05-29 13:00:41.841 | vod.saveJSON <WARNING> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] with no chapters!!
2023-05-29 13:00:41.841 | vod.saveJSON <SUCCESS> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]  (manual creation)
2023-05-29 13:00:42.139 | vod.addSegment <INFO> Adding segment S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 to S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]
2023-05-29 13:00:42.140 | vod.finalize <INFO> Finalize S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] @ /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305
2023-05-29 13:00:42.141 | vod <INFO> Fetching mediainfo of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353], segment #0
2023-05-29 13:00:42.142 | helper.mediainfo <INFO> Run mediainfo on /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 13:00:42.483 | helper.execSimple <EXEC> Executing 'mediainfo': $ /usr/bin/mediainfo --Full --Output=JSON /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"remux_1830224547.ts","...

<--- Last few GCs --->

[35:0x5614180]   310041 ms: Mark-sweep 7949.8 (8232.6) -> 7940.7 (8238.6) MB, 3852.7 / 0.0 ms  (average mu = 0.394, current mu = 0.007) allocation failure; scavenge might not succeed
[35:0x5614180]   314122 ms: Mark-sweep 7956.6 (8238.6) -> 7947.6 (8245.8) MB, 4048.5 / 0.0 ms  (average mu = 0.236, current mu = 0.008) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/usr/local/bin/node]
 2: 0xa8e823  [/usr/local/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xf3a3e5  [/usr/local/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0x16fb6b9  [/usr/local/bin/node]
MrBrax commented 1 year ago

oh no, this is probably the worst thing one can encounter... really hoping this is just a node version error

0x1f415 commented 1 year ago

I appear to be having this issue with normal stream capture as well and not just the vod download.

it seems to be something related to the remux specifically that is causing the problem. I can open the recorded ts file in cache/capture and it recorded the entire stream, but the remux process appeared to create a corrupt MP4 file in the vods folder and crash the entire server after the stream ends.

attached are the logs with debug mode on, stream ended at 00:17

2023-06-10.log remux_S202306E07 2023-06-09 [42301348827].ts_stderr.log remux_S202306E07 2023-06-09 [42301348827].ts_stdout.log

0x1f415 commented 1 year ago

interestingly, running the ffmpeg command in a docker exec session crashes the LiveStreamDVR server process without my having interacted directly with the application at all, leads me to believe the culprit is something listening to file change events as the remuxed vod mp4 file is being written to

stopping the server.js process with a SIGSTOP (to stop the container from immediately quitting when the process dies), running the ffmpeg command while it was stopped rendered the file out successfully, then when server.js was reawakened with SIGCONT it OOM'd.

MrBrax commented 1 year ago

that could definitely make sense, i should make an option to disable file watching

MrBrax commented 1 year ago

could you try the dev version with debug enabled and check how the memory rises? i also added a config option to disable the vod watching, maybe that helps

0x1f415 commented 1 year ago

on the develop tag docker image:

test 1: disabled file watcher, downloaded a moistcr1tikal vod (previously had none of his vods saved), no crash, server.js memory stayed consistently below 200 megs test 2: reenabled file watcher, downloaded a second moistcr1tikal vod, same as above test 3: file watcher still enabled, downloaded a paymoneywubby vod, memory usage grew to ~5.5GB and crashed with an OOM exception

seems apparent that this issue is dependent on the streamer, and indicates that the file watcher implementation doesn't scale to large vod collections:

image

now that I can see from your commit the specific function I may be able to take a shot at fixing it when I get the time, but leaving the watcher disabled works for now

MrBrax commented 1 year ago

does the new log line show the memory grow with each file watch event? i can find some memory issues searching for chokidar so maybe that's just what it does

0x1f415 commented 1 year ago

I haven't seen any information about memory usage in the logs actually, I've been monitoring resource usage through htop.

MrBrax commented 1 year ago

in the docker console when in debug mode, does it tell you about memory usage when file changes have been detected?

0x1f415 commented 1 year ago

no, I just checked my logs, made sure watcher was enabled an server was in debug mode, tried downloading a vod, but strangely I don't see any lines about memory usage at all

0x1f415 commented 1 year ago

also it seems that the file watcher disabler feature in 17ecfb8165db6a1af373d9c97ed3c6040d2a19a0 fixes the issue for downloading a vod, but not for finishing a recorded stream

MrBrax commented 1 year ago

that is quite odd considering it's disabled for both occasions, unless you didn't restart after changing it? maybe it's something other than the file watchers...

0x1f415 commented 1 year ago

disabling the watcher might not have taken the first time because I wasn't aware that that option required a restart to apply, but it has been disabled for about a week and the past 2 streams have still caused a crash - although interestingly these past two streams managed to finish remuxing before the crash, might be because they were shorter and finished remuxing before the server OOM'd, unsure. these past two crashes I wasn't actively watching the docker log output to confirm that the issue was the same OOM exception (the OOM exception isn't written to the docker logs for some reason) but I am pretty sure they were

MrBrax commented 1 year ago

is this after downloading multiple vods in the same "session"? i am noticing raised memory usage in docker on windows wsl

MrBrax commented 1 year ago

i tried to do some memory profiling on my windows machine but honestly i just don't have the experience to be able to get any useful information out of it, things that are as low level as this i really don't know where to start

MrBrax commented 1 year ago

is this still happening? i wonder if it has to do with the amount of log entries or keyvalues in memory

0x1f415 commented 1 year ago

yes it's still happening to me, not on every stream but usually on the longer ones and only when a stream ends. I want to see if I can diagnose and potentially fix the issue myself, but haven't found the time yet

0x1f415 commented 7 months ago

since my last comment, I've taken a couple stabs at sitting down and getting to the root of this issue, which involved setting up an isolated development VM, copying my entire vod database, etc-

today I sat down for another attempt, realized that my configuration had broken since subsequent upstream updates, and decided that this isn't worth the effort, for me anyway.

the root of the issue is that I had over 100 vods totaling almost 500 gigabytes, which don't all need to be under the application's purview. I'm going to wipe everything and setup the server again from a clean slate, this time configured to use the file export functionality, and I recommend users with similar issues do the same.