Everthing works fine, but what If Java Process of serverprod-01 crash (namenode1)?, serverprod-01 will try to access to storage "hdfs://serverprod-01:8020" but will not work and the availability of the storage will be broken.
Here we dont have anymore a namenode:port in our core-site.xml file, we will have a nameservice hdfs://thehive
and when we configure that in application.conf and restart TheHive.....
People are working 24/7 with TheHive and there are also other integrations where Cases are automatically created, we should have High Availability of the system for all the data, therefore TheHive should be able to manage a nameservice from Hadoop to keep this High Availability of the file system.
Request Type
Bug
Work Environment
Problem Description
I wanted to use HDFS to store the attachment data, I have a Cluster of 2 servers for TheHive. I configured the HDFS Cluster using https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html and I give the value "root: hdfs://serverprod-01:8020" and "root: hdfs://serverprod-02:8020"
Everthing works fine, but what If Java Process of serverprod-01 crash (namenode1)?, serverprod-01 will try to access to storage "hdfs://serverprod-01:8020" but will not work and the availability of the storage will be broken.
Then I configured a HDFS High Availability. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html or https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
Here we dont have anymore a namenode:port in our core-site.xml file, we will have a nameservice hdfs://thehive
and when we configure that in application.conf and restart TheHive.....
People are working 24/7 with TheHive and there are also other integrations where Cases are automatically created, we should have High Availability of the system for all the data, therefore TheHive should be able to manage a nameservice from Hadoop to keep this High Availability of the file system.
Thanks for your time and your answers.
Steps to Reproduce
Explained in the Problem Description.
Possible Solutions
For Spark (I think is also Scala Program) I found this solution: https://mungeol-heo.blogspot.com/2016/12/accessing-remote-ha-enabled-hdfs.html and this one https://itecnote.com/tecnote/apache-spark-how-to-access-hdfs-by-uri-consisting-of-h-a-namenodes-in-spark-which-is-outer-hadoop-cluster/ Nothing for TheHive.