DFIRKuiper / Kuiper

Digital Forensics Investigation Platform
760 stars 111 forks source link

Search broken on large shard/index #80

Closed nyrm-f closed 1 year ago

nyrm-f commented 1 year ago

Hello,

I have a large case in Kuiper with 74 hosts and have noticed that as it got bigger i was unable to conduct any type of filtering in Kuiper.

My shard health for this case is yellow, with its Index being 49.2 Gb

I can view the data with no filter just fine:

Screen Shot 2022-10-31 at 10 49 07 PM

But when I apply a simple filter i get this:

Screen Shot 2022-10-31 at 10 51 24 PM

I installed an elasticsearch health checker plugin, and i get this error when trying to search it on the plugin:

Screen Shot 2022-10-31 at 10 53 08 PM

Here are some logs from the Kuiper.log file:

"2022-11-01 05:56:26.730553","[DEBUG]","case_management.py.case_browse_artifacts_ajax[Lin.989]","case","Case[index2]: Query artifacts","{"sort": {"Data.@timestamp": {"order": "asc"}}, "query": {"query_string": {"query": "!(data_type:\"tag\") AND (admin)", "default_field": "catch_all"}}, "from": 0, "size": 30}"

"2022-11-01 05:56:26.731607","[DEBUG]","elkdb.py.query[Lin.211]","elasticsearch","Query to index [index2]","{"sort": {"Data.@timestamp": {"order": "asc"}}, "query": {"query_string": {"query": "!(data_type:\"tag\") AND (admin)", "default_field": "catch_all"}}, "track_total_hits": true, "from": 0, "size": 30}"

"2022-11-01 05:56:26.732678","[DEBUG]","case_management.py.browse_artifacts_list_ajax[Lin.891]","case","Case[index2]: Query artifacts list","{"query": {"query_string": {"query": "!(data_type:\"tag\") AND (admin)", "default_field": "catch_all"}}, "aggs": {"data_type": {"terms": {"field": "data_type.keyword", "order": {"_key": "asc"}, "size": 500}}}, "size": 0}"

"2022-11-01 05:56:26.733403","[DEBUG]","elkdb.py.query[Lin.211]","elasticsearch","Query to index [index2]","{"query": {"query_string": {"query": "!(data_type:\"tag\") AND (admin)", "default_field": "catch_all"}}, "track_total_hits": true, "aggs": {"data_type": {"terms": {"field": "data_type.keyword", "order": {"_key": "asc"}, "size": 500}}}, "size": 0}"

Tried searching around for help with this issue, is this because Kuiper is set to put all the data in one Share/Indice maybe?

salehmuhaysin commented 1 year ago

hi, Elasticsearch designed to have limited number of fields < 1024 in single index, because if you have large number of fields it might impact the memory since it will store all the mapping in memory (which is good case if you use it as SIEM with limited number of fields: src_ip, dst_ip, etc. but not for dynamic parser in forensics where each parser has fields).

by default Kuiper should increase the limitation of fields if it faced one https://github.com/DFIRKuiper/Kuiper/blob/707ee3addbb8676b916e35dc847e4c2dd76858a9/kuiper/app/database/elkdb.py#L248

in some cases i faced issue when it reach +40k fields where the elasticsearch consume all the memory (machine 64GB, JVM 32GB), but with 12k it worked fine. not sure why it did not increase the fields limit in your case, could you check if there is any error in Kuiper.log

regarding the size of the artifacts, i dont think this is related since it is regarding the fields number not the record numbers, but in general if you have a case with +40m records, i recommend to split it in two cases (two indices), so that you done face performance issue or use cluster, in my test