gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.74k stars 223 forks source link

ListObjects performs un-needed recursive listing on bucket with filesystem provider #575

Closed juangburgos closed 10 months ago

juangburgos commented 10 months ago

Awesome software, just one detail:

When there is a ListObjects request with a / delimeter, the filesystem provider should only query the folders, instead it does a recursive listing of all the sub-tree in the bucket (filesystem folder) which is un-needed and very expensive for deep trees with lots of files.

Basically in the logs we see one entry per file in the subtree recursivelly!:

[s3proxy] D [timestamp] S3Proxy-Jetty-40 o.j.b.config.LocalBlobStore:56 |::] Opening blob in container: [file]

Of course any S3 client timesout when the sub-tree is large.

gaul commented 10 months ago

Duplicate of #473. You can work around this by setting jclouds.version to 2.6.0-SNAPSHOT.

juangburgos commented 10 months ago

Sorry, did not know it was a dupe, not knowledgeable in Java, how/where would I change this parameter? I guess in s3proxy.conf? I will try tonight. Thanks!

gaul commented 10 months ago

You need to edit pom.xml and compile S3Proxy with mvn package.

juangburgos commented 10 months ago

Thanks, that worked, tests are fialing though, had to run with:

mvn package -DskipTests
juangburgos commented 10 months ago

Now the problem is on Windows, the returned sub-folders have the trailing \, for example:

<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>converted</Name>
  <Prefix/>
  <KeyCount>2</KeyCount>
  <MaxKeys>1000</MaxKeys>
  <ContinuationToken/>
  <StartAfter/>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>some-file.txt</Key>
    <LastModified>2020-05-01T23:37:35Z</LastModified>
    <Size>693</Size>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <CommonPrefixes>
    <Prefix>some-folder\</Prefix>
  </CommonPrefixes>
</ListBucketResult>

Which then an S3 client wil try to use to query the subfolder, and everything breaks from then on.

juangburgos commented 10 months ago

I have managed to make it work by adding the following to my reverse proxy configuration:

AddOutputFilterByType SUBSTITUTE application/xml
Substitute "s|\|/|n"

Basically substitute \ for / for XML responses. And now is usable 👍