coreos / torus

Torus Distributed Storage
https://coreos.com/blog/torus-distributed-storage-by-coreos.html
Apache License 2.0
1.77k stars 172 forks source link

storage/block_device: initial implementation #359

Closed cgonyeo closed 6 years ago

cgonyeo commented 7 years ago

This commit adds support for using a block device as the backing storage for torusd.

Two utilities have been added, mkfs.torus and fsck.torus, to format a block device for use by torus, and for checking the consistency of a formatted block device, respectively.

There is a metadata section of size 512 bytes written to the beginning of the disk, and the rest of the disk is used to store data.

Each location on the disk in which data can be stored has two things: block headers of size 512 bytes, and the actual data, whose size is determined by the torus cluster's block size.

When given a block ref, the location of the data is determined by feeding 24 bits comprised of the block ref's volume, inode, and index through sha256, converting the last 8 bytes into a uint64.

The block headers are simply a series of 32 bit chunks, containing the three fields from a block ref and a location field. The location field will either be 0, signifying the data for this block ref is at this location, or a non-zero integer, which is the location of a different set of block headers to go to.

When writing a block, if the default location is used for a preexisting block (so, a hash collision), linear probing is then used to find the next free location to use.

Fixes https://github.com/coreos/torus/issues/171.

cgonyeo commented 7 years ago

This is no longer a WIP, I'm happy with the current state of the tests and they all pass. Unfortunately it looks like travis ci doesn't support loopback devices, so my tests don't succeed when run there. Anyone have any ideas on that one? Maybe just disable my tests in CI for now?

cgonyeo commented 7 years ago

I've updated my added tests to require the integration build tag to run, to prevent them from failing in travis ci.

cgonyeo commented 7 years ago

Review bump

lpabon commented 7 years ago

@dgonyeo I'll review, plus... THANKS FOR THE UNIT TESTS!! Finally!

lpabon commented 7 years ago

@dgonyeo Quick question before I start the review. It says on your comment:

There is a metadata section of size 512 bytes written to the beginning of the
disk, and the rest of the disk is used to store data.

This would create a very unaligned disk chunk because everything would be off by 512 bytes. You want to store metadata on the first chunk, not LBA. For example, a 4k write to LBA 0 from the caller, would not be aligned to the chunk if the is 4k in size. If the chunk is 512k in size and the caller write the last 4k in that chunk, 4k-512 bytes would go into this chunk, and 512 bytes would go into the next.

cgonyeo commented 7 years ago

Seems to me like the easiest fix would be to make that starting section 4k then, right? I'll tweak this PR to do that.

cgonyeo commented 6 years ago

Cleaning up my old PRs