Rice-Comp413-2016 / RDFS

The Rice Comp413 class (2016-2017) implementation of HDFS. (This will eventually be put under an open source license, which one TBD).
4 stars 2 forks source link

Update data persistence layer to use raw disk I/O #46

Closed jfking closed 7 years ago

jfking commented 8 years ago

DataNodes currently store HDFS Blocks (big b) on the local filesystem as individual files. While this make storing and retrieving Blocks simple, it results in each Block being split into a number of smaller local blocks (small b) for ease of storage on the disk. While this is helpful for normal filesystems with small files that can be of different sizes, it isn't for us. We want to be able to write contiguous Blocks directly to disk without interference by a local filesystem.

This will entail keeping a list of free chunks of data for Blocks as well as updating the map from Block to filename to be a thread-safe map from Block to location on disk. We will not need to coalesce, since the Blocks will all be the same size and quite large.

Separate issues will be made for in-memory caches of blocks and for block locality considerations.

pelmers commented 7 years ago

done in #63