Zibbp / ganymede

Twitch VOD and Live Stream archiving platform. Includes a rendered and real-time chat for each archive.
https://github.com/Zibbp/ganymede
GNU General Public License v3.0
457 stars 24 forks source link

Video move task often fails #356

Open Entrivax opened 8 months ago

Entrivax commented 8 months ago

Quite often, the video move task ends up failing, even though the video was correctly moved.

image

2024-01-14 13:14:18 {"level":"debug","time":"2024-01-14T13:14:18+01:00","message":"finished vod video convert for 1988851558"}
2024-01-14 13:14:19 {"level":"debug","time":"2024-01-14T13:14:19+01:00","message":"checking if tasks are done for video 1988851558"}
2024-01-14 13:14:21 {"level":"debug","Attempt":1,"ChildWorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","Namespace":"default","RunID":"1c27280f-07ed-479e-9d09-f584eb680cac","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"bb9c0b61-f26b-4dc1-9269-8e82471db9e6_44","WorkflowType":"MoveVideoWorkflow","time":"2024-01-14T13:14:21+01:00","message":"ExecuteChildWorkflow"}
2024-01-14 13:14:24 {"level":"debug","ActivityID":"5","ActivityType":"MoveVideo","Attempt":1,"Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","WorkflowType":"MoveVideoWorkflow","time":"2024-01-14T13:14:24+01:00","message":"ExecuteActivity"}
2024-01-14 13:14:25 {"level":"debug","time":"2024-01-14T13:14:25+01:00","message":"moving file: /tmp/1988851558_043754bf-b273-11ee-a3a1-0242ac150002-video-convert.mp4 to /vods/hortyunderscore/2023-11-27-1988851558-archive-043754bf-b273-11ee-a3a1-0242ac150002/1988851558-video.mp4"}
2024-01-14 13:15:28 {"level":"debug","time":"2024-01-14T13:15:28+01:00","message":"running check live stream schedule"}
2024-01-14 13:15:28 {"level":"debug","time":"2024-01-14T13:15:28+01:00","message":"checking live channels"}
2024-01-14 13:15:28 {"level":"debug","time":"2024-01-14T13:15:28+01:00","message":"no live watched channels"}
2024-01-14 13:20:28 {"level":"debug","time":"2024-01-14T13:20:28+01:00","message":"running check live stream schedule"}
2024-01-14 13:20:28 {"level":"debug","time":"2024-01-14T13:20:28+01:00","message":"checking live channels"}
2024-01-14 13:20:28 {"level":"debug","time":"2024-01-14T13:20:28+01:00","message":"no live watched channels"}
2024-01-14 13:23:13 {"level":"warn","ActivityID":"5","ActivityType":"DownloadTwitchVideo","Attempt":1,"Error":"context deadline exceeded","Namespace":"default","RunID":"458fabf3-90c4-4244-a820-b2fd81554cba","TaskQueue":"video-download","WorkerID":"cc4cea6ec55c","WorkflowID":"82d2649a-9c03-49b2-925e-97191cc486c9_5","WorkflowType":"DownloadTwitchVideoWorkflow","time":"2024-01-14T13:23:13+01:00","message":"RecordActivityHeartbeat with error"}
2024-01-14 13:23:44 {"level":"warn","ActivityID":"5","ActivityType":"MoveVideo","Attempt":1,"Error":"context deadline exceeded","Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","WorkflowType":"MoveVideoWorkflow","time":"2024-01-14T13:23:44+01:00","message":"RecordActivityHeartbeat with error"}
2024-01-14 13:24:13 {"level":"warn","ActivityID":"5","ActivityType":"DownloadTwitchVideo","Attempt":1,"Error":"context canceled","Namespace":"default","RunID":"458fabf3-90c4-4244-a820-b2fd81554cba","TaskQueue":"video-download","WorkerID":"cc4cea6ec55c","WorkflowID":"82d2649a-9c03-49b2-925e-97191cc486c9_5","WorkflowType":"DownloadTwitchVideoWorkflow","time":"2024-01-14T13:24:13+01:00","message":"RecordActivityHeartbeat with error"}
2024-01-14 13:24:44 {"level":"warn","ActivityID":"5","ActivityType":"MoveVideo","Attempt":1,"Error":"context canceled","Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","WorkflowType":"MoveVideoWorkflow","time":"2024-01-14T13:24:44+01:00","message":"RecordActivityHeartbeat with error"}
2024-01-14 13:24:51 {"level":"debug","time":"2024-01-14T13:24:51+01:00","message":"deleting file: /tmp/1988851558_043754bf-b273-11ee-a3a1-0242ac150002-video.mp4"}
2024-01-14 13:24:54 {"level":"error","ActivityType":"MoveVideo","Attempt":1,"Error":"context canceled","Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","time":"2024-01-14T13:24:54+01:00","message":"Activity error."}
2024-01-14 13:25:04 {"level":"info","Error":"context deadline exceeded","Namespace":"default","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkerType":"ActivityWorker","time":"2024-01-14T13:25:04+01:00","message":"Task processing failed with error"}
2024-01-14 13:25:13 {"level":"warn","ActivityID":"5","ActivityType":"DownloadTwitchVideo","Attempt":1,"Error":"context canceled","Namespace":"default","RunID":"458fabf3-90c4-4244-a820-b2fd81554cba","TaskQueue":"video-download","WorkerID":"cc4cea6ec55c","WorkflowID":"82d2649a-9c03-49b2-925e-97191cc486c9_5","WorkflowType":"DownloadTwitchVideoWorkflow","time":"2024-01-14T13:25:13+01:00","message":"RecordActivityHeartbeat with error"}

[...]

2024-01-14 13:36:49 {"level":"debug","time":"2024-01-14T13:36:49+01:00","message":"moving file: /tmp/1988851558_043754bf-b273-11ee-a3a1-0242ac150002-video-convert.mp4 to /vods/hortyunderscore/2023-11-27-1988851558-archive-043754bf-b273-11ee-a3a1-0242ac150002/1988851558-video.mp4"}
2024-01-14 13:36:49 {"level":"error","ActivityType":"MoveVideo","Attempt":3,"Error":"error opening file: open /tmp/1988851558_043754bf-b273-11ee-a3a1-0242ac150002-video-convert.mp4: no such file or directory","Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","time":"2024-01-14T13:36:49+01:00","message":"Activity error."}
2024-01-14 13:36:49 {"level":"warn","ActivityID":"5","ActivityType":"MoveVideo","Attempt":3,"Error":"context canceled","Namespace":"default","RunID":"7aef4ed1-3e4f-4e36-8dfe-1f6e35dc6434","TaskQueue":"archive","WorkerID":"cc4cea6ec55c","WorkflowID":"1c27280f-07ed-479e-9d09-f584eb680cac_23","WorkflowType":"MoveVideoWorkflow","time":"2024-01-14T13:36:49+01:00","message":"RecordActivityHeartbeat with error"}

After this, the video is not present in the /tmp dir anymore, but was correctly moved in the destination directory, as you can see: image

From the logs above, it looks like the move task is taking too long, going to timeout, but still executing, then when wanting to retry, the source file is not here anymore.

Zibbp commented 8 months ago

I believe this is a similar issue to https://github.com/Zibbp/ganymede/issues/354. The heartbeats inside of the long running functions aren't properly ending when the task finished causing weird issues.

Entrivax commented 8 months ago

Yep, looks like it

Zibbp commented 8 months ago

I have some changes to the Heartbeat available using the :main image tag if you'd like to test it out. Not sure if it will fix this issue but I saw the heartbeat error in the logs.

KyuubiDDragon commented 7 months ago

i also tried to use the main, because i have the same issue. ghcr.io/zibbp/ganymede:main etc but got an error after the pulling because manifest unknown

Zibbp commented 7 months ago

i also tried to use the main, because i have the same issue. ghcr.io/zibbp/ganymede:main etc but got an error after the pulling because manifest unknown

I'm assuming you're running on an ARM cpu? If so, you'll need to specify the arm-specific tag if you want to run off of main, main-arm64. The latest tag combines both x86 and arm so that's why it works for that.

KyuubiDDragon commented 7 months ago

main-arm64 same problem. only latest works fine for me. i tried with main and main-arm64

Zibbp commented 7 months ago

What CPU is this running on? And to confirm, you're only changing the API container image? That's the only one required for this change.

KyuubiDDragon commented 7 months ago

Thats the issue. i changed all the 3 containers :). i will now test if the download will work correctly :)

KyuubiDDragon commented 7 months ago

A short feedback: With the api changes, theres the workflow tab that doesnt work. All the other tabs works fine.

f8 error TypeError: w is undefined NextJS 69 [_app-460189a8d20acafc.js:9:2661](http://192.168.50.51:4801/_next/static/chunks/pages/_app-460189a8d20acafc.js) TypeError: w is undefined NextJS 69 [_app-460189a8d20acafc.js:9:2661](http://192.168.50.51:4801/_next/static/chunks/pages/_app-460189a8d20acafc.js) A client-side exception has occurred, see here for more info: https://nextjs.org/docs/messages/client-side-exception-occurred [_app-460189a8d20acafc.js:9:2661](http://192.168.50.51:4801/_next/static/chunks/pages/_app-460189a8d20acafc.js)

Zibbp commented 7 months ago

Is that error from the browser console? If you go to the network tab of the browser tools, any failed requests? Is this just the /workflows page?

KyuubiDDragon commented 7 months ago

That error is from the browser console (f8) after the click there are 2 files: active?next_page_token= and closed?next_page_token= but without something after the =

The answer from the get is "null" grafik

Zibbp commented 7 months ago

v2.1.0 has been released with some fixes for the original issue in this post. Can you try it out and let me know if the video move error still happens?

That error is from the browser console (f8) after the click there are 2 files: active?next_page_token= and closed?next_page_token= but without something after the =

The answer from the get is "null" grafik

Can you try the newly released API and frontend version and let me know if this still happens? v2.1.0

KyuubiDDragon commented 7 months ago

i tried it again, and i have these logs:

2024/02/19 12:25:22 stdout {"level":"error","ActivityType":"MoveVideo","Attempt":3,"Error":"error opening file: open /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4: no such file or directory","Namespace":"default","RunID":"41e056a7-f643-48e0-b4b6-d34f84e71cc9","TaskQueue":"archive","WorkerID":"0e599836dbb4","WorkflowID":"975ed816-f3e9-4060-ac3f-b0f4c5f4d4f3_23","time":"2024-02-19T11:25:22Z","message":"Activity error."} 2024/02/19 12:25:22 stderr {"level":"debug","time":"2024-02-19T11:25:22Z","message":"moving file: /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4 to /vods/kyuubiddragon/2024-02-17-2064978132-archive-f68c286d-cf0c-11ee-ba64-0242c0a81005/2064978132-video.mp4"} 2024/02/19 12:23:22 stdout {"level":"error","ActivityType":"MoveVideo","Attempt":2,"Error":"error opening file: open /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4: no such file or directory","Namespace":"default","RunID":"41e056a7-f643-48e0-b4b6-d34f84e71cc9","TaskQueue":"archive","WorkerID":"0e599836dbb4","WorkflowID":"975ed816-f3e9-4060-ac3f-b0f4c5f4d4f3_23","time":"2024-02-19T11:23:22Z","message":"Activity error."} 2024/02/19 12:23:22 stderr {"level":"debug","time":"2024-02-19T11:23:22Z","message":"moving file: /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4 to /vods/kyuubiddragon/2024-02-17-2064978132-archive-f68c286d-cf0c-11ee-ba64-0242c0a81005/2064978132-video.mp4"} 2024/02/19 12:23:02 stderr {"level":"debug","time":"2024-02-19T11:23:02Z","message":"deleting file: /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video.mp4"} 2024/02/19 12:15:32 stderr {"level":"debug","time":"2024-02-19T11:15:32Z","message":"moving file: /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4 to /vods/kyuubiddragon/2024-02-17-2064978132-archive-f68c286d-cf0c-11ee-ba64-0242c0a81005/2064978132-video.mp4"} 2024/02/19 12:05:25 stderr {"level":"debug","time":"2024-02-19T11:05:25Z","message":"video convert args: [-y -hide_banner -i /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video.mp4 -c:v copy -c:a copy /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4]"} 2024/02/19 11:48:56 stderr {"level":"debug","time":"2024-02-19T10:48:56Z","message":"video convert args: [-y -hide_banner -i /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video.mp4 -c:v copy -c:a copy /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-video-convert.mp4]"} 2024/02/19 10:56:01 stderr {"level":"debug","time":"2024-02-19T09:56:01Z","message":"moving file: /tmp/2064978132_f68c286d-cf0c-11ee-ba64-0242c0a81005-chat.json to /vods/kyuubiddragon/2024-02-17-2064978132-archive-f68c286d-cf0c-11ee-ba64-0242c0a81005/2064978132-chat.json"}

In that case you can see, that he moved the file sucessfully and then try it multiple times again, but the file is not available and then i fails

KyuubiDDragon commented 7 months ago

@Zibbp Do you have a fix for this issue?

Zibbp commented 7 months ago

Is this a live stream archive? If possible, can you post the full output from when the archive started to when errors start?

KyuubiDDragon commented 7 months ago

No, its not a live stream archive. its a normal archive from stores vods

FirestrikeX commented 3 months ago

Having an issue just like this as well. Out of 10 VODs, it moved all of them successfully but on 8 out of 10 of them it showed Video Move as errored so it wont every fully complete.

Zibbp commented 3 months ago

Having an issue just like this as well. Out of 10 VODs, it moved all of them successfully but on 8 out of 10 of them it showed Video Move as errored so it wont every fully complete.

Can you provide logs from the API container around the time the error happens? To confirm, you're running the latest version?

FirestrikeX commented 3 months ago

Having an issue just like this as well. Out of 10 VODs, it moved all of them successfully but on 8 out of 10 of them it showed Video Move as errored so it wont every fully complete.

Can you provide logs from the API container around the time the error happens? To confirm, you're running the latest version?

Of course! Thanks! I deleted 3 VODs that had the error and restarted all containers. Below is the log file from the api container from when it re-aquires those 3 VODs from the channel up to the point where the status is reported as error on all 3 vods in the Queue. https://gist.github.com/FirestrikeX/f16d20fb34e3589eadb0297b463d16bd

Zibbp commented 3 months ago

The formatting of the logs you uploaded got all messed up and it's difficult to read. Can you run docker logs ganymede-api >& /tmp/ganymede-api.txt and upload the resulting file here? You may need to zip the .txt file.

I was able to archive 2159565823 on my test instance fine. It's only these 3 VODs that are having issues?