Zibbp / ganymede

Twitch VOD and Live Stream archiving platform. Includes a rendered and real-time chat for each archive.
https://github.com/Zibbp/ganymede
GNU General Public License v3.0
452 stars 24 forks source link

Workflows stuck active #453

Closed Blaiz0 closed 3 months ago

Blaiz0 commented 3 months ago

I have a few workflows that are stuck active. The oldest one is from the 9th of may. They think they should be failed, terminated, or completed. There was something about workflow-retention 30 days, in the temporal logs?

What happened: The last 3 workflows are from when i first upgraded from 1.4 to 2.3. Added a few active streams to the watched channels list to test if everything worked. Removed them from the watched channels list while they were actively archiving, because i added some 24/7 streams.. Deleted the queue items. The workflows remained. Might be unexpected behaviour to edit watched channels while archive is running.

The first 2 workflows, have one perfectly archived live VOD. Then a 2nd folder with the same external video ID. The 2nd folder contains video clips that are under 30sec, from the end of the stream. Guess this might be because i have a "Live_check_interval" of 30sec in the config file.

How to fix: I want to clean up the active workflows, but im not sure how, now that it is integrated in the postgresql database. There is a database called temporal, and temporal_visibility. The temporal database has schemas called queue, queues, +++

Not sure if i should delete the whole temporal database, or just some schemas, or the content of the schemas. Have a backup of the ganymede-db folder, but its a bit scary when its connected to the archival database. So i was hesitant to test it out.

Zibbp commented 3 months ago

Removed them from the watched channels list while they were actively archiving, because i added some 24/7 streams.. Deleted the queue items. The workflows remained.

If you're wanting to stop archiving a live stream you will need to click the stop queue item to finish archiving. Once the archive is complete you can delete the queue item for it. image

Editing the watched channel while it's being archived, or even deleting it, is fine.

If you want to just clean up the active workflows with no regard to keeping the files, restarting the ganymede-api container is easiest. You can also go to the temporal web ui and terminate the workflows there.

Blaiz0 commented 3 months ago

Restarting the ganymede-api container did not remove the workflows. Most of them have persisted trough 3 releases and countless restarts, and docker compose/down/up.

In temporal-UI i got "missing csrf token in request header" when trying to terminate or request cancellation. Added TEMPORAL_CORS_ORIGINS= env variable to the temporal-UI, but isn't entirely sure what URL/port it should point to.

Was eventually able to terminate the workflows using the web-ui on localhost:8233. For some reason my local IP for the same machine, 192.168.10.150:8233 did not work, but localhost:8233 did. Browsers are getting difficult thise days :)

Zibbp commented 3 months ago

Glad you were able to terminate those workflows. Temporal has some built-in security to only allow write access on localhost by default. Setting the cors_origins env var to your server's IP should work.

      - TEMPORAL_CORS_ORIGINS=<server ip>
      - TEMPORAL_CSRF_COOKIE_INSECURE=true

Do you recall what the stuck workflows were? They should all eventually timeout and die off, unless they somehow got stuck in the database.

Blaiz0 commented 3 months ago

Do you recall what the stuck workflows were?

Im not sure what you are asking. Here is a screenshot from yesterday. Screenshot 2024-06-18 230150

It was all live archives. It was 4 workflows from 9-12th of may, when i first migrated to a new pc, and ganymede with workflows. I changed the watched channel to false while archiving. And tried to cancel mid archive in some kind of way. Its about all i remember.

The other two were from 23 and 24 of may, i did not touch or try to cancel. There is two VODs for each date. Both contain first one vod as you would expect, then another one with the same video ID, and it only has a few seconds from the end of the stream.

I have it all cleaned up now, suspect they came from "failed" archives. Will report back if it happens again in the future!