gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.68k stars 219 forks source link

filesystem blobstore does not atomically replace objects #490

Open davidp1404 opened 1 year ago

davidp1404 commented 1 year ago

Hello, I am using the latest s3proxy image with the filesystem backend in our development environment. In our use case we have concurrent access to blobs but when we modify blobs we see that s3proxy report "NoSuchKey when calling the GetObject operation" eventually. This issue is related to the old reported one in https://issues.apache.org/jira/browse/JCLOUDS-835 that seems to be solved years ago, but seems to fail the s3 commitment "a mutating operation like write or overwrite should succeed and expose the new object or fail and retain the old object." Is there any explanation to the filesystem backend doesn't implement atomic behavior? The issue can be reproduced with this code:

$ cat qa-reader.py
import boto3
import os
import socket
import time
import urllib3
urllib3.disable_warnings()
s3_url = os.environ['s3_url']
pid = os.getpid()
hostname = socket.gethostname()
client = boto3.client(
    's3',
    endpoint_url=s3_url,
    aws_access_key_id="demouser",
    aws_secret_access_key="demouser123",
    verify=False
)
while True:
    obj = client.get_object(Bucket='qa-test', Key='qa-file.txt')
    msg = obj['Body'].read().decode('utf-8')
    print(msg,end='\r')

$ cat qa-writer.py
import boto3
import os
import socket
import time

s3_url = os.environ['s3_url']
pid = os.getpid()
hostname = socket.gethostname()
client = boto3.client(
    's3',
    endpoint_url=s3_url,
    aws_access_key_id="demouser",
    aws_secret_access_key="demouser123",
    verify=False
)
while True:
    msg = f"Updated by {hostname} (pid={pid})"
    client.put_object(Bucket='qa-test', Key='qa-file.txt', Body=str(msg))
    print("File updated +",end='\r')
    time.sleep(200/1000)
    print("File updated #",end='\r')

$ s3_url='http://127.0.0.1:9102' python qa-writer.py &
$ s3_url='http://127.0.0.1:9102' python qa-reader.py
Traceback (most recent call last):69605)
  File "qa-reader.py", line 19, in <module>
    obj = client.get_object(Bucket='qa-test', Key='qa-file.txt')
  File "/home/k8sadmin/s3-qatest/.venv/lib/python3.8/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/k8sadmin/s3-qatest/.venv/lib/python3.8/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

Thanks in advance!

gaul commented 1 year ago

Confirmed that this is a problem. This is a regression from apache/jclouds@ab25fc7259ad620a4daa14c12a37cef498320ad5 that I suspect was introduced to work around Windows strange behavior. I am happy to revert these lines from FilesystemStorageStrategyImpl.putBlob:

if (outputFile.exists()) {
   delete(outputFile);
}

Note that we should keep the exception handling. Note that the filesystem blobstore needs many improvements, in this case calling Files.remove(outputFile.toPath()) has more concise error propagation. Can you submit a PR?

davidp1404 commented 1 year ago

Hello Andrew, sorry but java is not a language I feel comfortable with so I'd appreciate it if you could release a corrected version of s3proxy that we could use. Thanks in advance.

gaul commented 1 year ago

I committed a partial fix to jclouds where the object will not disappear when being replaced. However, there is a related issue where the object can change while being fetched that requires further work. The symptoms are a mismatch between the expected and actual Content-Length.

davidp1404 commented 1 year ago

Thanks very much @gaul, looking forward to the new release of s3proxy including this fix.

gaul commented 1 year ago

Still thinking about this in apache/jclouds#165.