Rice-Comp413-2016 / RDFS

The Rice Comp413 class (2016-2017) implementation of HDFS. (This will eventually be put under an open source license, which one TBD).
4 stars 2 forks source link

Choosing target DataNodes #60

Open nichhk opened 7 years ago

nichhk commented 7 years ago

The NameNode needs to choose target DataNodes for a block using 3 criteria:

  1. current DataNode locations for the block, if it is not new
  2. the free disk space on a DataNode (#58)
  3. the number of transmissions on a DataNode (#59)
nichhk commented 7 years ago

@kyler-m: choose a random subset of valid DataNodes.

prb2 commented 7 years ago

Implemented find_datanode_for_block to use free_bytes and xmits in: 54f8df373fd31c92a97ca3dfc954cad4537e864c

Live DNs which do not already have a replica of the requested block and have enough free space to hold the block are pushed onto a priority queue based on num transmits. Then, the number of reqeusted datanodes (replication_factor many) are popped from the priority queue and returned to the caller.

Some clean up work remains.

prb2 commented 7 years ago

You can see in the image that we find two DNs that are:

  1. alive
  2. do not already contain the block which needs to be replicated
  3. have enough free space.

The first one we see has 5 transmits, the next one has 3 transmits. Since we minimize transmits, the one with 3 transmits is selected and returned.

screen shot 2016-11-17 at 11 03 51 pm