Open Cerfoglg opened 9 years ago
@Cerfoglg Thank you for the detailed description.
As I recall, we discussed to have a different buckets for "runs", "models", etc ... Why is it different in your description? Have you identified some issues in the solution we discussed about?
I've looked into it a bit more, and buckets don't ultimately matter as much as I thought, so doing this makes organisation easier in the end, without loss in performance. The hashing is what does the trick to speed up lookup.
Ok, if it does not impact the performance I would then stick to have different buckets for a better data arrangement (differentiate different types of data we need to access from different services).
Let use have, instead of a single benchmarks bucket, different buckets to separate our data. For starter we have for sure a runs bucket, and a models bucket for instance
@Cerfoglg can you please update the issue description according to the current state?
@VincenzoFerme Updated with current minio key structure
Minio stores file using a key/value structure, organised inside a series of partitions (buckets). It's important to define what buckets are going to be created on Minio, and the format for the keys pointing to our files.
Buckets should be named using the DNS structure as described here http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html . For our benchmarks, we are going to use a series of high level buckets for our different data, like "benchmarks".
While Minio and S3 file storage is not a conventional file system, it is general practice that keys follow the same format as a conventional file systems, like for example "folder/subfolder/foobar.txt" refering to a file "foobar.txt".
For our Minio file storage, we want keys inside the "runs" bucket to be represented with this format:
Where we indicate in the key:
The hash value at the start of the string is used to speed up key lookup. By adding a prefix in the form of a hash value computed from the rest of the key (such as a modulo operation) we create more unique prefixes and thus reduce the amount of characters that need to be compared when performing the lookup. More information here https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/