awslabs / mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Apache License 2.0
4.48k stars 154 forks source link

Inconsistent file behavior with mountpoint-s3 #1038

Open akhilesh-delphix opened 1 week ago

akhilesh-delphix commented 1 week ago

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?

We have started using https://github.com/awslabs/mountpoint-s3-csi-driver in EKS cluster to mount s3 bucket as volume. Facing some issues while handling files.

  1. If one of the thread deletes a files from mountpoint, the 2nd thread still sees that file. I had to add a delay of 1 second for 2nd thread, after that it also started seeing the file as deleted file (i.e. file did was not shown to this thread). Pls note that both these threads are from same application and accessing same mountpoint. Is there a better way of achieving this? (i am running with --allow-delete flag) As i have similar functionality at different place in my application.
  2. There are two processes in my application, Process one writes file in bucket using mount point , process two reads files from bucket using SDK. Right after (after 30 ms or so) process 1 completes writing to file (i.e after calling fileWriter.Close()), if process two tries to read files, SOMETIMES its says that file is not available. SOMETIMES - based on my analysis so far, if there is a longer delay at process 2, then files is available where as if process 2 starts right after (30-40 ms) process 1 the files is not available.

Mount Options: ` accessModes:

  • ReadWriteMany # supported options: ReadWriteMany / ReadOnlyMany mountOptions:
  • uid=65436
  • gid=50
  • allow-other
  • allow-delete

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Environment

dannycjones commented 1 week ago

Thanks for opening an issue, @akhilesh-delphix. I note that you are not using any kind of caching so should not be impacted by any relaxed consistency guarantees from that.

If one of the thread deletes a files from mountpoint, the 2nd thread still sees that file. I had to add a delay of 1 second for 2nd thread, after that it also started seeing the file as deleted file (i.e. file did was not shown to this thread). Pls note that both these threads are from same application and accessing same mountpoint. Is there a better way of achieving this? (i am running with --allow-delete flag) As i have similar functionality at different place in my application.

If the two applications are performing these concurrently, one thread may see the file while a delete (aka. unlink) is in-progress. If the two actions are actually running serially, then this behavior would be surprising (since a filesystem delete/unlink should not return until this has been persisted both to the local view and in Amazon S3). Really, we'd need timestamped logs from the application and from Mountpoint to understand what's happening and when. I'd also be interested to understand what you mean by the second thread "seeing the file" - is this from listing the directory or from trying to open or stat the file?

To avoid seeing this behavior, there needs to be a synchronization point between the two threads so that the second thread will always see the new state in S3.

There are two processes in my application, Process one writes file in bucket using mount point , process two reads files from bucket using SDK. Right after (after 30 ms or so) process 1 completes writing to file (i.e after calling fileWriter.Close()), if process two tries to read files, SOMETIMES its says that file is not available. SOMETIMES - based on my analysis so far, if there is a longer delay at process 2, then files is available where as if process 2 starts right after (30-40 ms) process 1 the files is not available.

There are some differing behaviors based on how the file was opened, documented in our Semantics Documentation.

If the application has written some data and it has not duplicated the file descriptor, closing the file should be synchronous and block the system call. I suspect you are in this situation and so should not be experiencing the non-blocking behavior. I'm assuming you're using Java's FileWriter class, which I presume will also block its close method on the system call however this would be something worth verifying. You may also be able to use a tool like strace to review what system calls are being made when.


Apologies for the rather complex answers, please do ask if any clarifications are required.

akhilesh-delphix commented 3 days ago

hi @dannycjones Thanks a lot for this detailed response.

I think lets address/discuss one issue at a time.

Issue one (which is solved by adding 1 second of gap between the two threads, pls note that 2nd thread only executes ones the first one completes, still i had to had 1 second of delay):

Here is the scenario inside bucket i have following file structure main_dir/1/src/file1.txt main_dir/1/src/file2.txt

Thread 1 runs to delete these files directory '1' recursively. Once its complete successfully. <IF i add delete here, it works> Thread 2 uses a Java Filewiter write data into file 'main_dir/1/src/file1.txt' (pls note that the file name is same which was deleted by thread 1), but it fails and gives java.io.FileNotFoundException exception while trying to open file.

Here is the stack (apologies for sharing java stack here)

at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:293)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:235)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:123)
    at java.base/java.io.FileWriter.<init>(FileWriter.java:66)

Pls note our code base works fine when we use NFS to store/manage files.

Will it be possible to have a connect over slack or teams or any other channel on this topic (in case you think the discussion is too specific)?