Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.94k stars 1.49k forks source link

'move-to-object-storage' failing repeatedly after update to v6.x #6149

Closed arladmin closed 8 months ago

arladmin commented 8 months ago

Describe the current behavior

After upgrading to v6.x, i've started to get EXTERNAL STORAGE MOVE FAILED error for many of the uploaded videos (not all).

Until now, i've uploaded thousands of videos, without ever seeing this issue.

But now, several videos are getting stuck in the EXTERNAL STORAGE MOVE FAILED state. I had to delete the videos and try again (because re-transcoding didn't work either), but that didn't work either.

example error log:


Job: 1684 Type: move-to-object-storage Processed on 12/31/23, 11:07:26.112 AM Finished on 12/31/23, 11:07:33.898 AM
--
{   "videoUUID": "dd21158c-0f7b-4db1-8207-caa18abdc86b",   "isNewVideo": true,   "previousVideoState": 2 }
NoSuchBucket: UnknownError     at throwDefaultError (/app/code/server/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:8:22)     at /app/code/server/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:18:39     at de_PutObjectCommandError (/app/code/server/node_modules/@aws-sdk/client-s3/dist-cjs/protocols/Aws_restXml.js:5721:12)     at processTicksAndRejections (node:internal/process/task_queues:95:5)     at runNextTicks (node:internal/process/task_queues:64:3)     at listOnTimeout (node:internal/timers:538:9)     at process.processTimers (node:internal/timers:512:7)     at async /app/code/server/node_modules/@smithy/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24     at async /app/code/server/node_modules/@aws-sdk/middleware-signing/dist-cjs/awsAuthMiddleware.js:14:20     at async /app/code/server/node_modules/@smithy/middleware-retry/dist-cjs/retryMiddleware.js:27:46     at async /app/code/server/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/flexibleChecksumsMiddleware.js:57:20     at async /app/code/server/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/region-redirect-endpoint-middleware.js:14:24     at async /app/code/server/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/region-redirect-middleware.js:9:20     at async /app/code/server/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:7:26     at async Promise.all (index 0)     at async Upload.__uploadUsingPut (/app/code/server/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:66:26)     at async Upload.__doConcurrentUpload (/app/code/server/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:128:28)     at async Promise.all (index 0)     at async Upload.__doMultipartUpload (/app/code/server/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:211:9)     at async Upload.done (/app/code/server/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:39:16)     at async uploadToStorage (file:///app/code/server/dist/core/lib/object-storage/shared/object-storage-helpers.js:152:23)     at async moveWebVideoFiles (file:///app/code/server/dist/core/lib/job-queue/handlers/move-to-object-storage.js:39:25)     at async moveToJob (file:///app/code/server/dist/core/lib/job-queue/handlers/shared/move-video.js:19:13)     at async Object.processMoveToObjectStorage [as move-to-object-storage] (file:///app/code/server/dist/core/lib/job-queue/handlers/move-to-object-storage.js:16:5)     at async Object.wrapPromiseFun (file:///app/code/server/dist/core/lib/plugins/hooks.js:8:24)     at async Worker.processJob (/app/code/server/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)     at async Worker.retryIfFailed (/app/code/server/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)

image


image


Additionally, videos are not playing at all while they are in this state: image

Running the script https://docs.joinpeertube.org/maintain/tools#move-video-files-from-filesystem-to-object-storage doesn't help either.


Why is this happening?


Additional information

dhk2 commented 8 months ago

I'm seeing the same thing intermittently. What is particularly strange is that the video shows it's is in object storage, and when retranscoding it first downloads the video from object storage to transcode. If remote storage is actively enabled, it then stalls in a "to move" state. If remote storage is disabled in production.yaml, it will download the video from object storage, transcode it, then successfully store the video locally and playback works properly

I'm using SeaweedFS for the s3 compatible object store.

arladmin commented 8 months ago

I believe this issue requires some sort of urgent acknowledgement and/or workaround/fix.

@Chocobozzz ?

Chocobozzz commented 8 months ago

@dhk2 please create a dedicated issue with peertube logs!

@arladmin I'm sorry but Framasoft doesn't offer urgent support, just standard support :) What is your s3 provider?

arladmin commented 8 months ago

@arladmin I'm sorry but Framasoft doesn't offer urgent support, just standard support :)

@Chocobozzz So far, my understanding is, that the v6.0.2 release of PeerTube itself is broken, not my deployment of it. Hence, the request for support: for a broken public release and not a private request.


What is your s3 provider?

DigitalOcean Spaces

Chocobozzz commented 8 months ago

Do you have logs from your S3 provider that may help us to understand the issue? Do you also have more details in PeerTube logs?

arladmin commented 8 months ago

Do you have logs from your S3 provider that may help us to understand the issue? Do you also have more details in PeerTube logs?

@Chocobozzz Which logs, particularly, are needed?

I could try to re-trigger the faults and grab the subsequent logs.

arladmin commented 8 months ago

Do you have logs from your S3 provider that may help us to understand the issue? Do you also have more details in PeerTube logs?

@Chocobozzz

Here are the Peertube logs, for the time period where i imported 5 videos into my instance, and only one of them succeeded. Rest 4 are left in the limbo, with the EXTERNAL STORAGE MOVE FAILED status.

https://privatebin.io/?08b42021fdc83b30#C4kDGngUQkdk4AeXskv9kWS8yA2zJ6RzYWtz8xRewnZ

arladmin commented 8 months ago

https://privatebin.io/?08b42021fdc83b30#C4kDGngUQkdk4AeXskv9kWS8yA2zJ6RzYWtz8xRewnZ

@Chocobozzz

Are these logs sufficient?

Any workaround for now?

arladmin commented 8 months ago

@Chocobozzz

The same video files are repeatedly erroring out.

Deleted, re-imported and re-transcoded them several times.

Video file size ranging from 436 MB to 2.3 GB.

Had to rollback my instance to v5.2.1, as the 6.x version is broken and thus, useless.

After rolling back, the same video files are getting processed properly, without any issue, without any change to the instance config/ecosystem.

Chocobozzz commented 8 months ago

Can you run yarn upgrade --latest @aws-sdk/client-s3 @aws-sdk/lib-storage @aws-sdk/s3-request-presigner @smithy/node-http-handler in peertube-latest/ directory to see if it fixes the issue? (you need to restart peertube then)

arladmin commented 8 months ago

Can you run yarn upgrade --latest @aws-sdk/client-s3 @aws-sdk/lib-storage @aws-sdk/s3-request-presigner @smithy/node-http-handler in peertube-latest/ directory to see if it fixes the issue? (you need to restart peertube then)

@Chocobozzz

I cannot (not allowed in Cloudron).

See: https://forum.cloudron.io/post/80387

arladmin commented 8 months ago

Can you run yarn upgrade --latest @aws-sdk/client-s3 @aws-sdk/lib-storage @aws-sdk/s3-request-presigner @smithy/node-http-handler in peertube-latest/ directory to see if it fixes the issue? (you need to restart peertube then)

@Chocobozzz

With assistance from a Cloudron maintainer, i have been able to test with the lib update suggested by you, in a test environment.

I imported 5 of the same videos into the test instance (with v6.0.2).

image

The 5th video:

image

rustyechelle commented 8 months ago

Hi

what is the correct way to fix a video in this state ? cannot reupload since it comes from a peertube stream as a last resort i can try to upload from the -fragmented.mp4 i can find in the object storage

the object storage contains a -fragmented.mp4 and .m3u8 whereas other videos contain also a -master.m3u8 and a -segments-sha256.json

fyi, the error in the move to storage job

ServiceUnavailable: Service is unable to handle request.
    at throwDefaultError (/var/www/peertube/versions/peertube-v6.0.2/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:8:22)
    at /var/www/peertube/versions/peertube-v6.0.2/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:18:39
    at de_UploadPartCommandError (/var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/client-s3/dist-cjs/protocols/Aws_restXml.js:5971:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@smithy/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/middleware-signing/dist-cjs/awsAuthMiddleware.js:14:20
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@smithy/middleware-retry/dist-cjs/retryMiddleware.js:27:46
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/flexibleChecksumsMiddleware.js:57:20
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/region-redirect-endpoint-middleware.js:14:24
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/region-redirect-middleware.js:9:20
    at async /var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:7:26
    at async Upload.__doConcurrentUpload (/var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:161:36)
    at async Promise.all (index 0)
    at async Upload.__doMultipartUpload (/var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:211:9)
    at async Upload.done (/var/www/peertube/versions/peertube-v6.0.2/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:39:16)
    at async uploadToStorage (file:///var/www/peertube/versions/peertube-v6.0.2/dist/core/lib/object-storage/shared/object-storage-helpers.js:152:23)
    at async moveHLSFiles (file:///var/www/peertube/versions/peertube-v6.0.2/dist/core/lib/job-queue/handlers/move-to-object-storage.js:52:29)
    at async moveToJob (file:///var/www/peertube/versions/peertube-v6.0.2/dist/core/lib/job-queue/handlers/shared/move-video.js:23:13)
    at async Object.processMoveToObjectStorage [as move-to-object-storage] (file:///var/www/peertube/versions/peertube-v6.0.2/dist/core/lib/job-queue/handlers/move-to-object-storage.js:16:5)
    at async Object.wrapPromiseFun (file:///var/www/peertube/versions/peertube-v6.0.2/dist/core/lib/plugins/hooks.js:8:24)
    at async Worker.processJob (/var/www/peertube/versions/peertube-v6.0.2/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
    at async Worker.retryIfFailed (/var/www/peertube/versions/peertube-v6.0.2/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)
Chocobozzz commented 8 months ago

@rustyechelle Please open a dedicated issue, with your s3 provider. Please also try https://github.com/Chocobozzz/PeerTube/issues/6149#issuecomment-1876710960 if it fixes your issue.

@arladmin Please provide peertube logs and failed job output to know what happened

arladmin commented 8 months ago

@arladmin Please provide peertube logs and failed job output to know what happened

@Chocobozzz

Logs have already been provided above: https://github.com/Chocobozzz/PeerTube/issues/6149#issuecomment-1876212423

Chocobozzz commented 8 months ago

Logs have already been provided above:

I mean logs after the lib update, to see if the error is still the same :)

arladmin commented 8 months ago

Logs have already been provided above:

I mean logs after the lib update, to see if the error is still the same :)

@Chocobozzz


image


image


Logs:

https://privatebin.io/?43e97c5224aef55c#8jyeEYXVV9JM3QKirfdrDYQW8KWqffMF9n9JAbZjXaqr

Chocobozzz commented 8 months ago

Thanks, can you contact digital ocean support? The logs you provide contains the AWS lib HTTP request including all headers. I think they may understand what is wrong with the request. Maybe they don't support the host style request?

arladmin commented 8 months ago

Thanks, can you contact digital ocean support? The logs you provide contains the AWS lib HTTP request including all headers. I think they may understand what is wrong with the request. Maybe they don't support the host style request?

@Chocobozzz

Which statements from the logs do i need to forward to them?

Chocobozzz commented 8 months ago

For example line 8095

arladmin commented 8 months ago

For example line 8095

@Chocobozzz

Thanks for highlighting this.

I'm a bit confused now, though. Because, the bucket (web-videos) that the file is being written to (as mentioned in the log), doesn't exist at all. It isn't even supposed to. And this is clearly configured in my production.yaml, like so:


storage:
  tmp: '/app/data/storage/tmp/' 
  avatars: '/app/data/storage/avatars/'
  streaming_playlists: '/app/data/storage/streaming-playlists/'
  redundancy: '/app/data/storage/redundancy/'
  logs: '/app/data/storage/logs/'
  previews: '/app/data/storage/previews/'
  thumbnails: '/app/data/storage/thumbnails/'
  torrents: '/app/data/storage/torrents/'
  captions: '/app/data/storage/captions/'
  cache: '/app/data/storage/cache/'
  plugins: '/app/data/storage/plugins/'
  client_overrides: '/app/data/storage/client-overrides/'
  bin: /app/data/storage/bin/
  well_known: /app/data/storage/well_known/
  tmp_persistent: /app/data/storage/tmp_persistent/
  # Use two different buckets for Web videos and HLS videos on AWS S3
  storyboards: /app/data/storage/storyboards/
  web_videos: /app/data/storage/web-videos/
object_storage:
  enabled: true
  # Example AWS endpoint in the us-east-1 region
  endpoint: '<region>.digitaloceanspaces.com'
  # Needs to be set to the bucket region when using AWS S3
  region: '<region>'
  videos:
    bucket_name: '<MY_S3_BUCKET>'
    prefix: 'direct/'
  streaming_playlists:
    bucket_name: '<MY_S3_BUCKET>'
    prefix: 'playlist/'
  AWS_ACCESS_KEY_ID: '<MY_ACCESS_KEY_ID>'
  AWS_SECRET_ACCESS_KEY: '<MY_ACCESS_KEY_SECRET>'
  credentials:
    aws_access_key_id: '<MY_ACCESS_KEY_ID>'
    aws_secret_access_key: '<MY_ACCESS_KEY_SECRET>'
    access_key_id: '<MY_ACCESS_KEY_ID>'
    secret_access_key: '<MY_ACCESS_KEY_SECRET>'
  max_upload_part: '1GB'
Chocobozzz commented 8 months ago

I think you found the root cause of the issue. I guess you forgot one instruction in the v6 important notes

In particular:

object_storage.videos must be renamed to object_storage.web_videos. The value of object_storage.web_videos.bucket_name doesn't need to be changed: https://github.com/Chocobozzz/PeerTube/blob/develop/config/production.yaml.example#L223

gramakri commented 8 months ago

Cloudron package maintainer here . Looks like this was a failure in our packaging changelog to not highlight the message @Chocobozzz mentioned. I will push a new package highlighting the changes.