TheHive-Project / TheHive

TheHive: a Scalable, Open Source and Free Security Incident Response Platform
https://thehive-project.org
GNU Affero General Public License v3.0
3.39k stars 617 forks source link

[Bug] Hadoop HDFS HA not compatible with TheHive #2444

Open Keroseno101 opened 1 year ago

Keroseno101 commented 1 year ago

Request Type

Bug

Work Environment

Question Answer
OS version (server) Linux Suse Enterprise
OS version (client) Windows 11, ...
Virtualized Env. False
Dedicated RAM 32 GB
vCPU 8
TheHive version / git hash 4.1.22.1
Package Type Binary
Database Cassandra
Index type Elasticsearch
Attachments storage HDFS

Problem Description

I wanted to use HDFS to store the attachment data, I have a Cluster of 2 servers for TheHive. I configured the HDFS Cluster using https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html and I give the value "root: hdfs://serverprod-01:8020" and "root: hdfs://serverprod-02:8020" image

Everthing works fine, but what If Java Process of serverprod-01 crash (namenode1)?, serverprod-01 will try to access to storage "hdfs://serverprod-01:8020" but will not work and the availability of the storage will be broken.

Then I configured a HDFS High Availability. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html or https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Here we dont have anymore a namenode:port in our core-site.xml file, we will have a nameservice hdfs://thehive image

and when we configure that in application.conf and restart TheHive.....

image

People are working 24/7 with TheHive and there are also other integrations where Cases are automatically created, we should have High Availability of the system for all the data, therefore TheHive should be able to manage a nameservice from Hadoop to keep this High Availability of the file system.

Thanks for your time and your answers.

Steps to Reproduce

Explained in the Problem Description.

Possible Solutions

For Spark (I think is also Scala Program) I found this solution: https://mungeol-heo.blogspot.com/2016/12/accessing-remote-ha-enabled-hdfs.html and this one https://itecnote.com/tecnote/apache-spark-how-to-access-hdfs-by-uri-consisting-of-h-a-namenodes-in-spark-which-is-outer-hadoop-cluster/ Nothing for TheHive.