jhk70 commented 3 years ago

Continued performance issues after upgrade to 4.1.1

Request Type

Bug

Work Environment

Question	Answer
OS version (server)	Ubuntu
OS version (client)	18.04
TheHive version / git hash	4.1.1 (docker image 4.1.1-2
Package Type	Docker
Browser type & version	Various

Problem Description

After upgrading from 4.0.5-1 to 4.1.0 and then 4.1.1:

audit entries don't show in the application "live stream" view.
I get the familiar "AuditSrv" error after a while
the "Data Index Status" section of the "Platform Status" page does not load (i.e. user session times out before it loads). This was consistent behaviour for 4.1.0 and 4.1.1. The Audit table has 1,265,475 entries.

Steps to Reproduce

Upgrade the hive as described here
Configure local lucene index.
Start server.
Use Server

Complementary information

Other observations / debug actions:

During initial indexing, there were a number of "org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend" errors. Removing MAX_HEAP_SIZE and HEAP_NEWSIZE settings on cassandra removed these.
During initial periods after the upgrade, there was evidence of memory exhaustion. More RAM was added and the host and thehive was given 16g via -e JAVA_OPTS='-Xms16g -Xmx16g'
Without the "Platform Status" page, I have been able to reindex with curl: curl -k "https://<host>:9000/api/v1/admin/index/Case/reindex" -H 'Authorization: Bearer *authwibble*' I have re-run these for each Index and the logs show that these complete successfully.

Snippets from the Audit reindex logs:

Mar 25 21:39:52 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 1265475 record(s) indexed
Mar 25 21:39:53 hivehost01 docker[26287]: [info] o.j.g.d.m.ManagementSystem [|] Index update job successful for [AuditRequestidMainaction]
Mar 25 21:39:53 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is finished

Mar 25 21:47:59 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:00 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:01 hivehost01 docker[26287]: [info] o.j.g.o.j.IndexRepairJob [|] Found index Audit
Mar 25 21:48:01 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is running: 0 record(s) indexed
Mar 25 21:48:02 hivehost01 docker[26287]: [info] o.j.g.d.m.ManagementSystem [|] Index update job successful for [Audit]
Mar 25 21:48:02 hivehost01 docker[26287]: [info] o.t.s.m.Database [00000020|] Reindex job is finished

Our implementation had been "misusing" tags (per the 4.1.0 release blog) and had some long tags containing links to raw alerts etc. This was evidenced with a 6sec load time on /api/v1/query?name=list-tags. I have deleted these tags from the "Custom Tags" view. Is it possible something in the Audit content could be causing this? Is it possible to truncate / compact the Audit table?
Probably unrelated but I see this on start of the server: Mar 25 21:27:40 hivehost01 docker[26287]: [warn] c.d.d.c.RequestHandler [|] Query '[4 bound values] SELECT column1,value,writetime(value) AS writetime,ttl(value) AS ttl F ROM thehive.graphindex WHERE key=:key AND column1>=:sliceStart AND column1<:sliceEnd LIMIT :maxRows;' generated server side warning(s): Read 947 live rows and 5788 tombstone cells for query SELECT * FROM thehive.graphindex WHERE key = 022689a05461e7 AND column1 >= 00 AND column1 < ff LIMIT 5000; token -8419547459570797906 (see tombstone_warn_threshold)
I have multiple times deleted & reconfigured the index. After restart (and before index), the "platform status" page loads (all indexes = "ERROR"). After I click "Reindex" on Audit, the indexing completes and the same performance issue is present. I can then no longer refresh / view the Index Status section of the Platform Status page.

nadouani commented 3 years ago

@To-om I assigned this issue to 4.1.2 but it needs investigation. Feel free to move it out of this milestone if it requires more investigation