DataGov-SamagraX / deploy

My learnings on setting up a cluster using docker-compose
0 stars 2 forks source link

Setting up HDFS #3

Open ChakshuGautam opened 2 years ago

ChakshuGautam commented 2 years ago
  1. HDFS is setup in a highly available fashion. There is at least one namenode and atleaset 3 datanodes. The replication is set to three. The sample data would be - NYC taxi data. The HA cluster will be tested by
    • Crashing the namenode and seeing if one of the other nodes converts itself to a namenode
    • Crashing one of the datanode to see if all the data is replicated thrice.
    • There should be a script to add datanodes. Use the script the datanode will get added and we would play around with increasing the replication factor of 4-5-6 and testing that using UI.
    • Testing to be done by checking if we have all the data when the replication factor is reduces to 1.
    • Benchmarking the server for ingestion when the replication factor is 1 through 6 and sharing that in a table.
    • When testing the robustness, the following failures will be tested
      • Data Disk Failure, Heartbeats and Re-Replication
      • Cluster Rebalancing
      • Data Integrity
      • Metadata Disk Failure
      • Snapshots
    • File Deletion testing
      • Testing if the file is in the /trash folder.
      • Setting a policy to auto delete /trash folder after 5 mins and testing space reclamation after that time.
    • Ability to setup the whole thing with a single click or running a command on Digital Ocean
    • Ability to change the above configs and redeploy without any loss of data.

      NOTE: Data integrity will be tested using SQL queries every time a change is made.