TheHive-Project / TheHive

TheHive: a Scalable, Open Source and Free Security Incident Response Platform
https://thehive-project.org
GNU Affero General Public License v3.0
3.28k stars 606 forks source link

Organisation does not exist anymore #2372

Closed Keroseno101 closed 1 year ago

Keroseno101 commented 2 years ago

Request Type

Bug

Work Environment

Question Answer
OS version (server) --> Linux Suse,
OS version (client) --> Windows 10 Pro
Virtualized Env. --> true
Dedicated RAM -->32 GB
vCPU --> 8
TheHive version --> 4.1.18.1
Database --> Cassandra
Index type --> Elasticsearch
Attachments storage --> Local

TheHive Cluster --> No Cassandra Cluster --> Yes (3 servers)

(in the future will be 2 TheHive Servers and 3 Cassandra Servers)

Problem Description

TheHive, Cassandra and Elasticsearch are working well, I updated yesterday, just /opt/thehive/bin and /opt/thehive/lib were overwrited.

I log in without problems and I can see all the Cases and Alerts, I can also create a new Case, but I can not see the Organisation.

We all log in with AD Credentials.

When I log in with admin@thehive.local I see just the ORG Admin, without users inside. image

I can log into my ORG but there is not ORG, very strange, so I tried to change my Profile Photo to see what does the Log: image

Very strange, I go to /opt/thehive/data and I change the permission: image

After restart the service, I click again in Organisation (not found) and I try to change my Profile Photo again (now without problems) image

I change in /etc/thehive/logback.xml to <logger name“org.thp“ level=“DEBUG“/> to see more but not usefull info there:

image

What is happening? How is the ORG gone?

Thanks in advance for the support.

I am afraid people can not use if I update to 4.1.19 because the fix of the bug " An user may exist without being member of any organisation" because I am going in Holidays next Week.

Steps to Reproduce

  1. Download thehive 4.1.18
  2. Update TheHive 4.1.8 with TheHive 4.1.18, overwriting /opt/thehive/lib and /opt/thehive/bin
  3. Change permissions: /opt/thehive/lib to 644 - root:thehive
  4. Change permissions: /opt/thehive/data to 666 - thehive:thehive

Possible Solutions

I tried changing permisions, I check cassandra Database and looks good (is in a cluster of 3 machines).

I tried create a new ORG with the same name as the other one: image

Complementary information

To-om commented 2 years ago

You have 2 unrelated problem:

Keroseno101 commented 1 year ago

Problem was fixed but I forgot to write it here.

After try many things and be almost crazy, I found the solution re-building the Index again.

This information was not easy to found but finally I found here:

https://blog.strangebee.com/thehive-4-1-16-is-out/

Just copy this lines at the end of /etc/thehive/application.conf and restart TheHive to start with the reindex.

-----------------------------------------------------------------

Temporary configuration to solve the immense terms indexing issue

This will be using as a first step of TheHive database initialization

This is used to truncate titles if you have immense titles issues

db.janusgraph.immenseTermProcessing.title = "truncate(1024)"

This is used to fix text observables with big values

db.janusgraph.immenseTermProcessing.data = "observableHashToIndex"

This is required to rebuild the index

db.janusgraph.forceDropAndRebuildIndex: true

-----------------------------------------------------------------

TheMatrix97 commented 1 year ago

Hi! Sorry for reopening this issue, but we are facing the same error in our TheHive deployment. I tried to run thehive with the configuration @Keroseno101 pointed out, but is still showing the same 404 error. I suspect there might be some kind of corrupted data within Janusgraph, backed with Cassandra. Is there any method to check the integrity of Cassandra? I don't have any clue about how to query a graph db as Janusgraph.

Thanks!

Keroseno101 commented 1 year ago

Hi! Sorry for reopening this issue, but we are facing the same error in our TheHive deployment. I tried to run thehive with the configuration @Keroseno101 pointed out, but is still showing the same 404 error. I suspect there might be some kind of corrupted data within Janusgraph, backed with Cassandra. Is there any method to check the integrity of Cassandra? I don't have any clue about how to query a graph db as Janusgraph.

Thanks!

Hi, I was not "completely clear" in my comment. I will try to go step by step about what you should do.

  1. Check that elasticsearch service is running in all your instances.
  2. Check that the version ( curl -X GET http://hostname:9200 -u elastic ) is not 8.x.x, you can use just the version 7.x.x from elasticsearch. (7.17.8 for example)
  3. Check the health of the cluster ( curl http://hostname:9200/_cluster/health?pretty -u elastic )
  4. Check Nodes --> Check logs /var/log/elasticsearch/nameofyourcluster.log and then curl http://hostname:9200/_cat_nodes?pretty -u elastic
  5. Check cassandra Databanken --> nodetool status (all nodes should be UN)
  6. go to /etc/thehive/application.conf and add at the end of the file "db.janusgraph.forceDropAndRebuildIndex: true"
  7. Restart service and watch live logs --> systemctl restart thehive && tail -f /var/log/thehive/application.log
  8. You should see after "Creating Database Schema" how say something like "drop the index globalXX" and start to DISABLE different things. After that you should just see "reindex
  9. After reindex the data you should be your old ORG.

Is not working? Give me:

TheMatrix97 commented 1 year ago

Hi @Keroseno101 Thanks for your fast response, just to add some information, I've the whole deployment running in Kubernetes, so, I'm pretty sure some nasty elasticsearch restart migh left some inconsistency within the index.

Again, thank you very much for helping me out with this :)

Keroseno101 commented 1 year ago
  • re right, there is something wrong during the rebuild index phase, it's facing some issue while trying to remove the indexes.... The index global cannot be removed (java.lang.UnsupportedOperationException:

Hi,

Health of the cluster would be OK, always better at GREEN but looks like is working.

The Index GlobalX cannot be removed is also normal. The database looks also OK.

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

The Problem is with the Index, let´s go a step ahead....

TheMatrix97 commented 1 year ago

Well... Just after the reindexing has finished I just noticed my TheHive instance went back to default, the admin password has been reset to default and now no Organization is being shown... It seems data has just gone or gone corrupted

Keroseno101 commented 1 year ago

Well... Just after the reindexing has finished I just noticed my TheHive instance went back to default, the admin password has been reset to default and now no Organization is being shown... It seems data has just gone or gone corrupted

Data is in Cassandra with more than 500Mb, data is not gone, and if Data were corrupted, you could not start TheHive. Data is there, but the index is not doing his job, try following the last steps, delte directly all the Elasticsearch Data with rm -rf /var/lib/elasticsearch/* and follow all the process, I was like 10 days maybe like crazy with this topic and I fix it, you will fix it.

TheMatrix97 commented 1 year ago

Yes, you are right... Data is still there, phew.... I just logged into a different user and the dashboard changed... xD. So, Data is being stored in Cassandra and ElasticSearch is just being used as a index? Where is data persisted for long term? I didn't find any information about the exact use case of ES and Cassandra in thehive

Keroseno101 commented 1 year ago

A pleasure to explain you. Thehive use Cassandra as a Database, all the information that you write in TheHive go directly to your Cassandra database.

Cassandra works with Keyspaces, you have to say in /etc/thehive/application.conf the name of the keyspace (by default: thehive) Cassandra create a database in a keyspace with this name, in this example /var/lib/cassandra/data/thehive/ There you have the Tables with all the information (dont touch it :-) ) Remember the chown -R cassandra:cassandra /var/lib/cassandra

Elasticsearch works just like a index, is important but is not a critical. You can delete all the information about the Index, uninstall elasticsearch, and whatever you want, your data will be still safe in Cassandra. Remember also a chown -R elasticsearch:elasticsearch /var/lib/elasticsearch/

TheHive calls Elasticsearch and ask about Index information, Elasticsearch calls Cassandra and say "Ey, I need this and this and this" and everything goes fast because of that.

Is your instance working again with all the information that you had before? that happened using the last solution (rm -rf /var/lib/elasticsearch/) or worked just with the solution before that? is just for the people who come here in the future :-)

TheMatrix97 commented 1 year ago

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

Hi! I just removed all indexes related to Thehive (globalx), I had 5 different indexes. global 1-5. Then, I restarted thehive with the option db.janusgraph.forceDropAndRebuildIndex: true. It seems the process starts correctly and the job finishes, leaving a single global6 index ,but during the process throws an error:

[info] o.j.g.o.j.IndexRepairJob - Found index global6
[info] o.t.s.m.Database - Reindex job 20241192 is running
[info] o.t.s.m.Database - Reindex job 20241192 is running
[error] o.j.g.d.m.ManagementLogger - Evicted [11@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]
[error] o.j.g.d.m.ManagementLogger - Evicted [9@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]

After the rebuild is finished, which takes quite a lot... The organisation is still missing.... It appears in the UI, but the API throws a 404 error.

$ curl -I -X "GET" http://localhost/api/organisation/myorg
404

I reached a dead end... Any other ideas?

Thanks!

Keroseno101 commented 1 year ago

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

Hi! I just removed all indexes related to Thehive (globalx), I had 5 different indexes. global 1-5. Then, I restarted thehive with the option db.janusgraph.forceDropAndRebuildIndex: true. It seems the process starts correctly and the job finishes, leaving a single global6 index ,but during the process throws an error:

[�[37minfo�[0m] o.j.g.o.j.IndexRepairJob - Found index global6
[�[37minfo�[0m] o.t.s.m.Database - Reindex job 20241192 is running
[�[37minfo�[0m] o.t.s.m.Database - Reindex job 20241192 is running
[�[31merror�[0m] o.j.g.d.m.ManagementLogger - Evicted [11@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]
[�[31merror�[0m] o.j.g.d.m.ManagementLogger - Evicted [9@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]

After the rebuild is finished, which takes quite a lot... The organisation is still missing.... It appears in the UI, but the API throws a 404 error.

$ curl -I -X "GET" http://localhost/api/organisation/myorg
404

I reached a dead end... Any other ideas?

Thanks!

Hi, this error can happen, but does not mean nothing important. If you see your ORG and Cases and Users in WEB UI, that means you are doing a wrong api call.

I would suggest you to use the right port (9000) and use also a api key from a Org User for api calls.

With this command you should get the name of all your ORG:

curl -H "Authorization: Bearer APIKEY" -H "Content-Type: application/json" -X GET "http://hostname:9000/api/organisation"

TheMatrix97 commented 1 year ago

Hi @Keroseno101 I can confirm the Organisation seems to exist now, the admin page shows the organisation and I can create new users. Although, I'm quite sure I found a bug unrelated with indexes.

Right now I have organisation "A", this organisation existed one uppon a time, and "user1" was the org-admin of this organisation. Eventually, Cassandra went full, so TheHive crashed, we were able to increase the volume size and recover the application. Although, users were apparently lost...

If I run the query to list users of a given organisation I'm only obtaining "user2", which was the one created after the cassandra crash:

curl -H "Authorization: Bearer APIKEY" -H "Content-Type: application/json" -X POST http://localhost:9000/api/v0/query

Body:

{
  "query": [
    {
      "_name": "getOrganisation",
      "idOrName": "A"
    },
    {
      "_name": "users"
    },
    {
      "_name": "sort",
      "_fields": [
        {
          "login": "asc"
        }
      ]
    },
    {
      "_name": "page",
      "from": 0,
      "to": 15,
      "organisation": "A"
    }
  ]
}

It returns user2 only:

[
    {
        "_id": "~333099454",
        "id": "user2",
        "createdBy": "admin@thehive.local",
        "updatedBy": "admin@thehive.local",
        "createdAt": 1679485199426,
        "updatedAt": 1679485211963,
        "_type": "user",
        "login": "user2",
        "name": "user2",
        "roles": [
            "admin",
            "write",
            "read",
            "alert"
        ],
        "organisation": "A",
        "hasKey": true,
        "status": "Ok"
    }
]

But now, if I try to create the "user1" again....

$ curl -X POST http://localhost:9000/api/v1/user

Body:

{
  "login" : "user1",
  "name" : "user1",
  "organisation": "A",
  "profile": "org-admin",
  "email": "user1",
  "password": "supersecret"
}

It returns 201 with the information of user1 (before the Cassandra crash), but it indicates is assigned to organization "no - org"

{
    "_id": "~24632",
    "_createdBy": "admin@thehive.local",
    "_updatedBy": "admin@thehive.local",
    "_createdAt": 1678881396871, # BEFORE CASSANDRA CRASH
    "_updatedAt": 1679484293863,
    "login": "user1",
    "name": "user1",
    "hasKey": true,
    "hasPassword": true,
    "hasMFA": false,
    "locked": false,
    "profile": "org-admin",
    "permissions": [
        "manageShare",
        "manageAnalyse",
        "manageTask",
        "manageCaseTemplate",
        "manageCase",
        "manageUser",
        "manageProcedure",
        "managePage",
        "manageObservable",
        "manageTag",
        "manageConfig",
        "manageAlert",
        "accessTheHiveFS",
        "manageAction"
    ],
    "organisation": "no org",
    "organisations": [],
    "extraData": {}
}

So, although it returns a 201 status code, the user "user1" is missing from organisation A.

I'm pretty aware this is unrelated to Indexes and ElasticSearch. I'm creating a different issue...

Again, thank you very much for you help!