GoogleCloudPlatform / gcsfuse

A user-space file system for interacting with Google Cloud Storage
https://cloud.google.com/storage/docs/gcs-fuse
Apache License 2.0
2.05k stars 426 forks source link

Unable to copy files from mounted Cloud Storage Bucket due to "Reader returned too many bytes" #996

Closed 1crew closed 1 year ago

1crew commented 1 year ago

If the file size is above 14MB - the copy command from a mounted Storage Bucket to another location on the host fails: e.g. sudo cp /my-gcsfuse-mounted-bucket/my_file ~/my_file

fuse_debug: 2023/03/08 01:13:19.773482 Op 0x00000343 connection.go:416] <- interrupt (fuseid 0x00000342) gcs: 2023/03/08 01:13:19.773667 Req 0x1d: -> Read("testing-junit5-mockito-1.0-2ac943e.jar", [14876672, 14885696)) (51.392624ms): OK debug_fs: 2023/03/08 01:13:19.773694 ReadFile(6, 14884864): fh.reader.ReadAt: Reader returned 3264 too many bytes 2023/03/08 01:13:19.773738 ReadFile: input/output error, fh.reader.ReadAt: Reader returned 3264 too many bytes fuse_debug: 2023/03/08 01:13:19.773791 Op 0x00000342 connection.go:500] -> Error: "input/output error" fuse: 2023/03/08 01:13:19.773809 *fuseops.ReadFileOp error: input/output error

Steps to reproduce the behavior: Bucket is mounted using sudo gcsfuse --foreground --debug_fuse --debug_gcs --debug_fs --debug_fuse_errors my-gcs-bucket /etc/testshare

Additional context Issue is not present for smaller files (e.g. tested successfully with 10MB)

1crew commented 1 year ago

We deleted the files from the bucket and uploaded them again, afterwards the copy command to another location on the local file system worked. If you can advise what could have been the cause - please let us know, thank you!

raj-prince commented 1 year ago

Hi @1crew,

Thanks for reaching out!

By looking at the first line of logs fuse_debug: 2023/03/08 01:13:19.773482 Op 0x00000343 connection.go:416] <- interrupt (fuseid 0x00000342) it is confirmed that GCSFuse operation with fuseid 0x00000342 got interrupt from the kernel. This interrupt would be mostly for the read operation.

You need to check - why did you get the interrupt? If you are able to reproduce this issue - then you might try running the cp command with strace enabled. That will help, in finding the root cause of interrupt.

Also, I don't think, here size is the issue, larger size just increase the chance for interrupt. We have tested the read workflow for the files in GBs. I also tried to reproduce this, but couldn't succeeded.

Please let us know, if you find anything!

-Prince

raj-prince commented 1 year ago

In the mean time, I tried to generate interrupt while ReadFile call, but every time we get different error context canceled - check the below trace.

D0313 17:01:04.038189 fuse_debug: Op 0x000012b6        connection.go:416] <- ReadFile (inode 97, PID 3231619, handle 38, offset 209190912, 262144 bytes)
D0313 17:01:04.038418 gcs: Req             0x61: <- Read("gcsfuse_relitabilit_test_failed_logs_16_dec.txt", [209190912, 213385216))
D0313 17:01:04.631096 fuse_debug: Op 0x000012b7        connection.go:416] <- interrupt (fuseid 0x000012b6)
D0313 17:01:04.631586 gcs: Req             0x61: -> Read error: context canceled
D0313 17:01:04.631706 debug_fs: ReadFile(97, 209190912): fh.reader.ReadAt: readFull: context canceled
E0313 17:01:04.631800 ReadFile: interrupted system call, fh.reader.ReadAt: readFull: context canceled
D0313 17:01:04.631914 fuse_debug: Op 0x000012b6        connection.go:500] -> Error: "interrupted system call"

So, not sure, the above issue is due to interrupt. This might be due to some wrong metadata associated with the jar file, and after re-uploading the file issue got resolved.

We would love to know, if you are able to reproduce this behavior again.

Thanks, Prince.

ognemirivanov commented 1 year ago

Hello @raj-prince ,

The issue in our case was with the transfer of JAR files from the Github Action runner to our GCP bucket.

Initially, we were using the "google-github-actions/upload-cloud-storage@v1" action to transfer the JAR file from the Github Action runner directory to a bucket. However, upon completion of the upload, we found that the cksum on the JAR files were different. The same JAR file uploaded manually was 15.9 MB, but the one uploaded with the Github Action was only 14.7 MB.

To resolve this issue, we changed our approach to uploading. Instead of using the Google action, we used the gsutil command to upload the JAR file to the bucket. We modified our Github Actions workflow with the following updated code:

After this change, we successfully deployed the JAR files without any errors and any difference in the size . Thank you for your attention to this matter.

raj-prince commented 1 year ago

Got it. Thanks a lot @ognemirivanov for the clarification!!

Closing this issue now.

-Prince