DataNodes currently store HDFS Blocks (big b) on the local filesystem as individual files. While this make storing and retrieving Blocks simple, it results in each Block being split into a number of smaller local blocks (small b) for ease of storage on the disk. While this is helpful for normal filesystems with small files that can be of different sizes, it isn't for us. We want to be able to write contiguous Blocks directly to disk without interference by a local filesystem.
This will entail keeping a list of free chunks of data for Blocks as well as updating the map from Block to filename to be a thread-safe map from Block to location on disk. We will not need to coalesce, since the Blocks will all be the same size and quite large.
Separate issues will be made for in-memory caches of blocks and for block locality considerations.
DataNodes currently store HDFS Blocks (big b) on the local filesystem as individual files. While this make storing and retrieving Blocks simple, it results in each Block being split into a number of smaller local blocks (small b) for ease of storage on the disk. While this is helpful for normal filesystems with small files that can be of different sizes, it isn't for us. We want to be able to write contiguous Blocks directly to disk without interference by a local filesystem.
This will entail keeping a list of free chunks of data for Blocks as well as updating the map from Block to filename to be a thread-safe map from Block to location on disk. We will not need to coalesce, since the Blocks will all be the same size and quite large.
Separate issues will be made for in-memory caches of blocks and for block locality considerations.