awslabs / mountpoint-s3-csi-driver

Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an Amazon S3 bucket as a storage volume accessible by containers in your Kubernetes cluster.
Apache License 2.0
215 stars 26 forks source link

Error writing trailer of /ComfyUI/output/AnimateDiff_00014.mp4: Invalid argument #278

Closed talhadar90 closed 3 weeks ago

talhadar90 commented 1 month ago

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened? I deployed ComfyUI on AWS using this guide https://aws.amazon.com/blogs/architecture/deploy-stable-diffusion-comfyui-on-aws-elastically-and-efficiently/

What you expected to happen? I created ComfyUI docker image, and started running the workflow. It was able to generate images and save it to my PVC however when I try to generate video, it generates however doesnt push it to PVC and gives this error Error writing trailer of /ComfyUI/output/AnimateDiff_00014.mp4: Invalid argument

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?: The PNG, JPEG, GIF are working however MP4 are causing an issue. When I try to run the video workflow and generate video without PVC, it is able to save the mp4 within the container.

Environment

unexge commented 1 month ago

Hey @talhadar90, thanks for reporting the issue. The version v1.0.0 seems a bit old, there have been some improvements both in Mountpoint and the CSI Driver since v1.0.0. Could you please try again with the latest CSI Driver version (v1.9.0)? Also, could you please enable debug logging by adding debug to mountOptions of your PVC, and share Mountpoint logs with us?

talhadar90 commented 1 month ago

Okay after updating the version to v1.9.0 - still getting this error

got prompt
Starting download for URL: https://storyblocker-videos.s3.amazonaws.com/RPReplay_Final1727850251+2.MP4
Attempting to download file to: /ComfyUI/input/RPReplay_Final1727850251+2.MP4
File download completed and verified
File download completed successfully
[VideoHelperSuite] - WARNING - Output images were not of valid resolution and have had padding applied
Error writing trailer of /ComfyUI/output/AnimateDiff_00001.mp4: Invalid argument
Prompt executed in 12.10 seconds

NAME                READY   STATUS    RESTARTS   AGE
s3-csi-node-kjtkc   3/3     Running   0          12m
s3-csi-node-nwbrc   3/3     Running   0          12m
s3-csi-node-rfv9q   3/3     Running   0          12m
sh-4.2$ sudo journalctl --unit $UNIT
-- Logs begin at Fri 2024-10-18 10:32:39 UTC, end at Fri 2024-10-18 12:32:25 UTC. --
Oct 18 12:00:13 ip-10-2-123-17.ec2.internal systemd[1]: Starting Mountpoint for Amazon S3 CSI driver FUSE daemon...
Oct 18 12:00:13 ip-10-2-123-17.ec2.internal systemd[1]: Started Mountpoint for Amazon S3 CSI driver FUSE daemon.
Oct 18 12:03:29 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] ioctl{req=52 ino=38 fh=2 cmd=21505}: mountpoint_s3::fuse: ioctl failed: operation not supported by Mountpoint
Oct 18 12:03:29 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] getxattr{req=54 ino=38 name="security.capability"}: mountpoint_s3::fuse: getxattr failed: operation not supported by Mountpoint
Oct 18 12:03:32 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] ioctl{req=118 ino=40 fh=3 cmd=21505}: mountpoint_s3::fuse: ioctl failed: operation not supported by Mountpoint
Oct 18 12:04:16 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] ioctl{req=158 ino=41 fh=5 cmd=21505}: mountpoint_s3::fuse: ioctl failed: operation not supported by Mountpoint
Oct 18 12:04:17 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] ioctl{req=222 ino=43 fh=6 cmd=21505}: mountpoint_s3::fuse: ioctl failed: operation not supported by Mountpoint
Oct 18 12:06:17 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] ioctl{req=268 ino=44 fh=9 cmd=21505}: mountpoint_s3::fuse: ioctl failed: operation not supported by Mountpoint
Oct 18 12:06:24 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] write{req=468 ino=45 fh=10 offset=40 length=4 pid=92460 name=AnimateDiff_00001.mp4}: mountpoint_s3::fuse: write failed: upload error: out of order write is NOT supported by Mountpoint, aborting the upload; expected offset 2172274 but got 40
Oct 18 12:06:24 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] flush{req=470 ino=45 fh=10 pid=92460 name=AnimateDiff_00001.mp4}: mountpoint_s3::fuse: flush failed: upload already aborted for key "AnimateDiff_00001.mp4"
Oct 18 12:06:25 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] write{req=490 ino=46 fh=11 offset=40 length=4 pid=92582 name=AnimateDiff_00001.mp4}: mountpoint_s3::fuse: write failed: upload error: out of order write is NOT supported by Mountpoint, aborting the upload; expected offset 48 but got 40
Oct 18 12:06:25 ip-10-2-123-17.ec2.internal mount-s3[87438]: [WARN] flush{req=492 ino=46 fh=11 pid=92582 name=AnimateDiff_00001.mp4}: mountpoint_s3::fuse: flush failed: upload already aborted for key "AnimateDiff_00001.mp4"
muddyfish commented 1 month ago

In your Mountpoint error logs, it says

mountpoint_s3::fuse: write failed: upload error: out of order write is NOT supported by Mountpoint, aborting the upload; expected offset 2172274 but got 40

In Mountpoint's semantics, it states that writes must be made sequentially - out of order writes aren't allowed.

I'd recommend seeing if your application supports flags/settings for using only sequential writes, or alternatively writing to temporary storage before copying/moving to the MP volume

dannycjones commented 3 weeks ago

Hopefully Simon's recommendation to perform random writes on a temporary file system before moving to Mountpoint's file system helped. I'll resolve for now.