bbengfort / fluidfs

A highly consistent distributed filesystem built with FUSE
http://www.fluidfs.com
MIT License
1 stars 0 forks source link

Create blob store mechanism. #16

Closed bbengfort closed 7 years ago

bbengfort commented 7 years ago

Store blobs in a directory structure such that blobs are not in a single directory but rather in multiple directories based on the prefix of their hash.

bbengfort commented 7 years ago

Based on the following:

It looks like ext4 allows an unlimited number of files per directory. However ls, find, readdir, etc. read 32k of directory entries at a time - so I’m going to call that the upper limit of # of files in a directory.

Blob hashes are configurable with the following algorithms: md5, sha1, sha224, sha256, and murmur. Signature lengths for both hex and b64 encoding are as follows:

sha224: 56 hex chars - 7.0 blocks of 8 chars
sha256: 64 hex chars - 8.0 blocks of 8 chars
sha1: 40 hex chars - 5.0 blocks of 8 chars
murmur: 32 hex chars - 4.0 blocks of 8 chars
md5: 32 hex chars - 4.0 blocks of 8 chars

sha224: 40 b64 chars - 5.0 blocks of 8 chars
sha256: 44 b64 chars - 5.5 blocks of 8 chars
sha1: 28 b64 chars - 3.5 blocks of 8 chars
murmur: 24 b64 chars - 3.0 blocks of 8 chars
md5: 24 b64 chars - 3.0 blocks of 8 chars

If we use subdirectories of 8 characters, the “blocks” are the depth of the storage tree.

My feeling is to default to SHA256 + Base64 Encoding (without padding) for a tree depth of 5 to minimize blob name collisions and have a low likelihood of >32k files per directory.

bbengfort commented 7 years ago

I also asked Kostas who asked Amol about how pages are stored in Postgres; he said that pages follow a BTree implementation, but that I should look at the source code for that. I can do that if this scheme doesn’t seem to fit.