TheHive-Project / TheHive

TheHive: a Scalable, Open Source and Free Security Incident Response Platform
https://thehive-project.org
GNU Affero General Public License v3.0
3.28k stars 609 forks source link

[Bug] Performance issue using search function #2312

Open Recovzz opened 2 years ago

Recovzz commented 2 years ago

Request Type

Bug

Work Environment

Question Answer
OS version (server) CentOS 7
Virtualized Env. True
Dedicated RAM 8 GB
TheHive version 4.1.16
Package Type RPM
Database Cassandra
Index type Lucene

Problem Description

I am having a performance problem when I use the Hive's search function, the search function is much slower on my test environment which has the latest version of the hive 4.1.16. I have a test environment that I have upgraded to thehive 4.1.16 and another production environment that is in 3.5.1. When I use the search function via a filter on thehive 3.5.1 it returns the result in 26,21 ms for search one case while on the 4.1.16 version it takes almost 11,97 s to return the result of the same case. The database is exactly the same and we have around 12 580 cases in both environments.

Are you aware of this problem ? Are other users experiencing this phenomenon ?

PROD ENVIRONMENT THE HIVE 3.5.1 :

InkedPROD_LI

TEST ENVIRONMENT THE HIVE 4.1.16 :

InkedTST_LI

priamai commented 2 years ago

WOW that is a massive performance hit. I don't have that many cases so far and we use the Janus backend not the Cassandra. Can you show what type of filter queries you are performing, I know a little bit of Cassandra and the performance hit can depend on many factors for example the way the tables are organized, the indexes and the type of search that is performed. Cassandra doesn't have a full SQL language so my guess is that the Search filter query is not translated into a column wise search filter and maybe is getting most rows first and then filtering later. I will have to see both the query type and the dig the source code.

priamai commented 2 years ago

Can you connect to Cassandra and do a quick view on the tables that are created and their schema? Then we will need to figure out the query that gets generated to assess the performance.

mamoedo commented 2 years ago

This is a known issue. See label:scope:performance and https://github.com/TheHive-Project/TheHive/issues/1428 https://github.com/TheHive-Project/TheHive/issues/2116

priamai commented 2 years ago

Yak so it is doing a scan all query!

This is a known issue. See label:scope:performance and #1428 #2116

I can see the #1428 was closed, did you test it personally? For the #2116, what did you do in the meanwhile?

mamoedo commented 2 years ago

Yak so it is doing a scan all query!

This is a known issue. See label:scope:performance and #1428 #2116

I can see the #1428 was closed, did you test it personally? For the #2116, what did you do in the meanwhile?

I think I tested #1428 but there were still issues. For the rest of the performance issues I'm just praying, because every day my set up is slower and it's getting to the unusable zone again. No matter how much CPU, RAM or DISK you throw to the platform, it just doesn't help

Recovzz commented 2 years ago

I see that there is an issue that has just been created, other people have the same problem as me regarding the search feature doesn't work #2314

Recovzz commented 2 years ago

I am surprised that not many people have this problem because the product is difficult to use.. Will there be a prioritization of this problem ? Is a fix planned ? @To-om @nadouani

b3belov commented 2 years ago

I have the same issue. My database size:

Dashboard with alert statistics for the past 3 months (~15000 alerts) take up to 1.5 minutes to execute. All requests made during alert selection hang in queue - which make Thehive interface unresponsive.

No matter how much CPU/RAM i add, i even tried to move the database to SSD - it didn't help either.

Also, i have setup with Scylla database as an alternative to Cassandra. With this setup i managed to reduce mentioned above query time to 40 sec, not what i wanted to achieve but it is something.

Recovzz commented 2 years ago

I tried to use Scylla instead of Cassandra but the performance is similar as in my issue above. I'm still trying to solve the performance issues.

You can see a big difference in performance between thehive 3.5 & the latest version of thehive 4.1.16. I am surprised that few people have this problem because it is complicated for me to use the hive in the latest version.

I would like to ask you again, are you aware of this problem ? @To-om @nadouani