Closed solracsf closed 2 years ago
Just my opinion here...
I think this is out of scope for s3backer.
Besides, this is something that could easily be scripted, e.g.:
#!/bin/bash
# Mount every subdir of $1 via s3backer
find "${1}" -maxdepth 1 -type d -print | while read DIR; do
PREFIX=`basename "${DIR}"`
s3backer -F /etc/s3backer.conf mybucket/"${PREFIX}" "${DIR}"
done
Yes but this requires several mounts, so imagine 20+ mounts each one requiring its own resources, manage each mount...this could be a lot, hence my suggestion where one and only instance would manage it, like one can do programmatically with some PHP libraries for prefixes or even multi-bucket distribution.
OK it sounds like two different things are being conflated here...
Correct me if I'm wrong: The main problem you are trying to solve is to get around the limit of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second.
OK, so one way to do that is to spread s3backer's data over multiple prefixes in the same bucket.
One way to do that is this "prefix automation" idea which you are suggesting.
Fine.. but there are probably much simpler ways.
Let's go back to the original problem: the limits on requests per second.
We already have a similar "spreading" mechanism in place to reduce contention, namely the --blockHashPrefix
flag.
Today the --blockHashPrefix
flag prepends a "random" 8 digit hex prefix like 3e09fab2-
to the block number, so the full object name ends up being something like 3e09fab2-00000001
.
If instead of using a dash to separate the hash and the block number, it used a slash, then this would spread the blocks across a bunch of different prefixes. E.g. the full object name would be something like 3e09fab2/00000001
.
This would be an incompatible change and so would require a new flag.
But if you'd like to just test this, try checking out the master
branch and then applying this patch:
diff --git a/http_io.c b/http_io.c
index fe96e23..344497e 100644
--- a/http_io.c
+++ b/http_io.c
@@ -96,7 +96,7 @@
+ S3B_BLOCK_NUM_DIGITS + 2)
/* Separator string used when "--blockHashPrefix" is in effect */
-#define BLOCK_HASH_PREFIX_SEPARATOR "-"
+#define BLOCK_HASH_PREFIX_SEPARATOR "/"
/* Bucket listing API constants */
#define LIST_PARAM_MARKER "marker"
Awesome! I'll give it a try, but Delimiter
and Prefix
should be passed along the query, is this the case?
Because there are folders and prefixes but only prefixes allow the limit bypass.
See https://aws.amazon.com/premiumsupport/knowledge-center/s3-prefix-nested-folders-difference/
Note: The folder structure might not indicate any partitioned prefixes that support request rates.
Actually, the patch produces folders i believe (i'm not 100% sure on this, just "reading" the S3 interface...)
The Delimiter
and Prefix
parameters are only used for listing queries. I think it should work.
"Folders" are just an illusion. S3 only stores objects in a flat namespace. But the Amazon web console treats the /
character in an object name as if it were special and creates "Folders" out of thin air. And S3 now also apparently applies traffic limits based on the which /
-based subtree the object is in.
Ok, so, in your opinion, the patch at https://github.com/archiecobbs/s3backer/issues/166#issuecomment-1025231802 just works out of the box? Screenshot above has been produced after the patch was installed, in a dedicated bucket.
Ok, so, in your opinion, the patch at #166 (comment) just works out of the box? Screenshot above has been produced after the patch was installed, in a dedicated bucket.
I would say yes it is supposed to just work out of the box, based on my understanding of how Amazon is applying their traffic limits.
But it's still an untested theory at this point...
When you read this, you can understand why this seems confusing 😆 https://stackoverflow.com/questions/52443839/s3-what-exactly-is-a-prefix-and-what-ratelimits-apply
As one knows, split files between prefixes is a good practice: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
This is already possible in s3backer thanks to the
--prefix
option, or by (recommended) specifyingbucket mybucket/foo/bar
. But this has to be a manual process, and one must create several mounts (correct me if i'm wrong), one per prefix.Instead, my proposal is some kind of prefix automation:
1 / by default, keep the same behavior as described above
2 / introduce a new option like
--autoFirstLevelPrefix
that would activate the following: treat every folder at the 1st level of the filesystem mount (suppose the mount is/mnt
) as a prefix; so, in case of/mnt/photos
,/mnt/movies
,/mnt/docs
these would be passed in the requests to S3 as prefixes: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.htmlI can't really evaluate the necessary work to accomplish this, if this is something that sounds odd or not, but in large filesystems, this (or a better one) prefix automation would definitely be a very good option to optimize AWS S3 performance.
The reason is one can send 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an Amazon S3 bucket. There are no limits to the number of prefixes that one can have in any bucket. https://aws.amazon.com/premiumsupport/knowledge-center/s3-request-limit-avoid-throttling/