Organisation does not exist anymore

TheHive-Project / TheHive

TheHive: a Scalable, Open Source and Free Security Incident Response Platform

https://thehive-project.org

GNU Affero General Public License v3.0

3.28k stars 606 forks source link

Organisation does not exist anymore #2372

Closed Keroseno101 closed 1 year ago

Keroseno101 commented 2 years ago

Request Type

Bug

Work Environment

Question	Answer
OS version (server) --> Linux Suse,
OS version (client) --> Windows 10 Pro
Virtualized Env. --> true
Dedicated RAM -->32 GB
vCPU --> 8
TheHive version --> 4.1.18.1
Database --> Cassandra
Index type --> Elasticsearch
Attachments storage --> Local

TheHive Cluster --> No Cassandra Cluster --> Yes (3 servers)

(in the future will be 2 TheHive Servers and 3 Cassandra Servers)

Problem Description

TheHive, Cassandra and Elasticsearch are working well, I updated yesterday, just /opt/thehive/bin and /opt/thehive/lib were overwrited.

I log in without problems and I can see all the Cases and Alerts, I can also create a new Case, but I can not see the Organisation.

We all log in with AD Credentials.

When I log in with admin@thehive.local I see just the ORG Admin, without users inside.

I can log into my ORG but there is not ORG, very strange, so I tried to change my Profile Photo to see what does the Log:

Very strange, I go to /opt/thehive/data and I change the permission:

After restart the service, I click again in Organisation (not found) and I try to change my Profile Photo again (now without problems)

I change in /etc/thehive/logback.xml to <logger name“org.thp“ level=“DEBUG“/> to see more but not usefull info there:

What is happening? How is the ORG gone?

Thanks in advance for the support.

I am afraid people can not use if I update to 4.1.19 because the fix of the bug " An user may exist without being member of any organisation" because I am going in Holidays next Week.

Steps to Reproduce

Download thehive 4.1.18
Update TheHive 4.1.8 with TheHive 4.1.18, overwriting /opt/thehive/lib and /opt/thehive/bin
Change permissions: /opt/thehive/lib to 644 - root:thehive
Change permissions: /opt/thehive/data to 666 - thehive:thehive

Possible Solutions

I tried changing permisions, I check cassandra Database and looks good (is in a cluster of 3 machines).

I tried create a new ORG with the same name as the other one:

Complementary information

To-om commented 2 years ago

You have 2 unrelated problem:

the configured location to store attachment (/opt/thehive/data) must be writable by thehive user. I don't known why it was not the case but you fixed it.
the organisation is not visible (but it exists). This is very strange because the admin user should be able to see all organisations. Maybe a issue on index. Can you try to reindex the data ?

Keroseno101 commented 1 year ago

Problem was fixed but I forgot to write it here.

After try many things and be almost crazy, I found the solution re-building the Index again.

This information was not easy to found but finally I found here:

https://blog.strangebee.com/thehive-4-1-16-is-out/

Just copy this lines at the end of /etc/thehive/application.conf and restart TheHive to start with the reindex.

-----------------------------------------------------------------

Temporary configuration to solve the immense terms indexing issue

This will be using as a first step of TheHive database initialization

This is used to truncate titles if you have immense titles issues

db.janusgraph.immenseTermProcessing.title = "truncate(1024)"

This is used to fix text observables with big values

db.janusgraph.immenseTermProcessing.data = "observableHashToIndex"

This is required to rebuild the index

db.janusgraph.forceDropAndRebuildIndex: true

-----------------------------------------------------------------

TheMatrix97 commented 1 year ago

Hi! Sorry for reopening this issue, but we are facing the same error in our TheHive deployment. I tried to run thehive with the configuration @Keroseno101 pointed out, but is still showing the same 404 error. I suspect there might be some kind of corrupted data within Janusgraph, backed with Cassandra. Is there any method to check the integrity of Cassandra? I don't have any clue about how to query a graph db as Janusgraph.

Thanks!

Keroseno101 commented 1 year ago

Hi! Sorry for reopening this issue, but we are facing the same error in our TheHive deployment. I tried to run thehive with the configuration @Keroseno101 pointed out, but is still showing the same 404 error. I suspect there might be some kind of corrupted data within Janusgraph, backed with Cassandra. Is there any method to check the integrity of Cassandra? I don't have any clue about how to query a graph db as Janusgraph.

Thanks!

Hi, I was not "completely clear" in my comment. I will try to go step by step about what you should do.

Check that elasticsearch service is running in all your instances.
Check that the version ( curl -X GET http://hostname:9200 -u elastic ) is not 8.x.x, you can use just the version 7.x.x from elasticsearch. (7.17.8 for example)
Check the health of the cluster ( curl http://hostname:9200/_cluster/health?pretty -u elastic )
Check Nodes --> Check logs /var/log/elasticsearch/nameofyourcluster.log and then curl http://hostname:9200/_cat_nodes?pretty -u elastic
Check cassandra Databanken --> nodetool status (all nodes should be UN)
go to /etc/thehive/application.conf and add at the end of the file "db.janusgraph.forceDropAndRebuildIndex: true"
Restart service and watch live logs --> systemctl restart thehive && tail -f /var/log/thehive/application.log
You should see after "Creating Database Schema" how say something like "drop the index globalXX" and start to DISABLE different things. After that you should just see "reindex
After reindex the data you should be your old ORG.

Is not working? Give me:

screenshot of step 3
screenshot of step 8 (log)
screenshot "ls -la /var/lib/elasticsearch/"
Screenshot of Errors from thehive-reindex log maybe?

TheMatrix97 commented 1 year ago

Hi @Keroseno101 Thanks for your fast response, just to add some information, I've the whole deployment running in Kubernetes, so, I'm pretty sure some nasty elasticsearch restart migh left some inconsistency within the index.

1 / 2 / 3 I'm currently running elasticsearch 7.17.8 Health (the index is in Yellow state):

{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":35,"active_shards":35,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":34,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.72463768115942}

5 The Cassandra seems to be running fine:

root@cassandra-0:/# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  <ip>  521.57 MiB  256          100.0%            <id>  rack1

6 - 9 You are right, there is something wrong during the rebuild index phase, it's facing some issue while trying to remove the indexes....

The index global cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)

[[37minfo[0m] o.t.s.m.Database - Creating database schema
[[37minfo[0m] o.t.s.m.Database - Disable index global4
[[37minfo[0m] o.t.s.m.Database - Wait for the index global4 to become disabled
[[37minfo[0m] o.j.g.d.m.ManagementSystem$UpdateStatusTrigger - Set status DISABLED on schema element global4 with property keys [date, organisationId, data, endDate, dueDate, sighted, _createdBy, source, type, objectType, number, predicate, caseId, action, attachmentId, contentType, pap, order, group, read, caseTemplate, dataType, lastSyncDate, tags, relatedId, size, resolutionStatus, name, hashes, assignee, sourceRef, startDate, impactStatus, status, ignoreSimilarity, flag, description, title, login, _label, organisationIds, requestId, _createdAt, _updatedAt, value, objectId, mainAction, severity, summary, _updatedBy, follow, message, colour, namespace, tlp, ioc, taskId]
[[37minfo[0m] o.j.g.d.m.ManagementLogger - Received all acknowledgements for eviction [1]
[[37minfo[0m] o.j.g.d.m.GraphIndexStatusWatcher - All 57 key(s) on index global4 have status(es) [DISABLED]
[[33mwarn[0m] o.t.s.m.Database - The index global cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)
[[33mwarn[0m] o.t.s.m.Database - The index global1 cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)
[[33mwarn[0m] o.t.s.m.Database - The index global2 cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)
[[33mwarn[0m] o.t.s.m.Database - The index global3 cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)
[[33mwarn[0m] o.t.s.m.Database - The index global4 cannot be removed (java.lang.UnsupportedOperationException: External mixed indexes must be removed in the indexing system directly.)
[[37minfo[0m] o.t.s.m.Database - Wait for the index global5 to become available
....
....
[info] o.t.s.m.Database - Reindex data for global5 (job: 378dacdf)
[info] o.t.s.m.Database - Reindex job 378dacdf is running
...
...
[info] o.j.g.d.m.ManagementSystem - Index update job successful for [global5]

It seems Index global5 it's being reindexed correctly, but the rest are totally ignored.... Are you familiar with this error?

Again, thank you very much for helping me out with this :)

Keroseno101 commented 1 year ago

re right, there is something wrong during the rebuild index phase, it's facing some issue while trying to remove the indexes.... The index global cannot be removed (java.lang.UnsupportedOperationException:

Hi,

Health of the cluster would be OK, always better at GREEN but looks like is working.

The Index GlobalX cannot be removed is also normal. The database looks also OK.

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

The Problem is with the Index, let´s go a step ahead....

Stop TheHive
Go to /var/lib/elasticsearch/
rm -rf nodes
systemctl restart elasticsearch
Check log from elasticsearch
Create a new password /usr/share/elasticsearch/bin/elasticsearch-setup-password interactive
write the same password for all the users (use a easy password, this is not important right now)
Go to /etc/thehive/application.conf and change the password of elasticsearch user.
Restart again TheHive with dropandrebuild activated.
After TheHive start, go and loggin into the Web Aplication with a user that you know of with the Admin User, do you see yes the ORG?

TheMatrix97 commented 1 year ago

Well... Just after the reindexing has finished I just noticed my TheHive instance went back to default, the admin password has been reset to default and now no Organization is being shown... It seems data has just gone or gone corrupted

Keroseno101 commented 1 year ago

Well... Just after the reindexing has finished I just noticed my TheHive instance went back to default, the admin password has been reset to default and now no Organization is being shown... It seems data has just gone or gone corrupted

Data is in Cassandra with more than 500Mb, data is not gone, and if Data were corrupted, you could not start TheHive. Data is there, but the index is not doing his job, try following the last steps, delte directly all the Elasticsearch Data with rm -rf /var/lib/elasticsearch/* and follow all the process, I was like 10 days maybe like crazy with this topic and I fix it, you will fix it.

TheMatrix97 commented 1 year ago

Yes, you are right... Data is still there, phew.... I just logged into a different user and the dashboard changed... xD. So, Data is being stored in Cassandra and ElasticSearch is just being used as a index? Where is data persisted for long term? I didn't find any information about the exact use case of ES and Cassandra in thehive

Keroseno101 commented 1 year ago

A pleasure to explain you. Thehive use Cassandra as a Database, all the information that you write in TheHive go directly to your Cassandra database.

Cassandra works with Keyspaces, you have to say in /etc/thehive/application.conf the name of the keyspace (by default: thehive) Cassandra create a database in a keyspace with this name, in this example /var/lib/cassandra/data/thehive/ There you have the Tables with all the information (dont touch it :-) ) Remember the chown -R cassandra:cassandra /var/lib/cassandra

Elasticsearch works just like a index, is important but is not a critical. You can delete all the information about the Index, uninstall elasticsearch, and whatever you want, your data will be still safe in Cassandra. Remember also a chown -R elasticsearch:elasticsearch /var/lib/elasticsearch/

TheHive calls Elasticsearch and ask about Index information, Elasticsearch calls Cassandra and say "Ey, I need this and this and this" and everything goes fast because of that.

Is your instance working again with all the information that you had before? that happened using the last solution (rm -rf /var/lib/elasticsearch/) or worked just with the solution before that? is just for the people who come here in the future :-)

TheMatrix97 commented 1 year ago

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

Hi! I just removed all indexes related to Thehive (globalx), I had 5 different indexes. global 1-5. Then, I restarted thehive with the option db.janusgraph.forceDropAndRebuildIndex: true. It seems the process starts correctly and the job finishes, leaving a single global6 index ,but during the process throws an error:

[[37minfo[0m] o.j.g.o.j.IndexRepairJob - Found index global6
[[37minfo[0m] o.t.s.m.Database - Reindex job 20241192 is running
[[37minfo[0m] o.t.s.m.Database - Reindex job 20241192 is running
[[31merror[0m] o.j.g.d.m.ManagementLogger - Evicted [11@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]
[[31merror[0m] o.j.g.d.m.ManagementLogger - Evicted [9@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]

After the rebuild is finished, which takes quite a lot... The organisation is still missing.... It appears in the UI, but the API throws a 404 error.

$ curl -I -X "GET" http://localhost/api/organisation/myorg
404

I reached a dead end... Any other ideas?

Thanks!

Keroseno101 commented 1 year ago

After this reindex you should be able to log in with a normal user (or with the ADMIN user and see your ORG)

Hi! I just removed all indexes related to Thehive (globalx), I had 5 different indexes. global 1-5. Then, I restarted thehive with the option db.janusgraph.forceDropAndRebuildIndex: true. It seems the process starts correctly and the job finishes, leaving a single global6 index ,but during the process throws an error:
[�[37minfo�[0m] o.j.g.o.j.IndexRepairJob - Found index global6
[�[37minfo�[0m] o.t.s.m.Database - Reindex job 20241192 is running
[�[37minfo�[0m] o.t.s.m.Database - Reindex job 20241192 is running
[�[31merror�[0m] o.j.g.d.m.ManagementLogger - Evicted [11@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]
[�[31merror�[0m] o.j.g.d.m.ManagementLogger - Evicted [9@0a0203767-thehive-01] from cache but waiting too long for transactions to close. Stale transaction alert on: [standardjanusgraphtx[0x42e1f3a4], standardjanusgraphtx[0x745813fe], standardjanusgraphtx[0x7067992a], standardjanusgraphtx[0x7ea0788e]]
After the rebuild is finished, which takes quite a lot... The organisation is still missing.... It appears in the UI, but the API throws a 404 error.
$ curl -I -X "GET" http://localhost/api/organisation/myorg
404
I reached a dead end... Any other ideas?

Thanks!

Hi, this error can happen, but does not mean nothing important. If you see your ORG and Cases and Users in WEB UI, that means you are doing a wrong api call.

I would suggest you to use the right port (9000) and use also a api key from a Org User for api calls.

With this command you should get the name of all your ORG:

curl -H "Authorization: Bearer APIKEY" -H "Content-Type: application/json" -X GET "http://hostname:9000/api/organisation"

TheMatrix97 commented 1 year ago

Hi @Keroseno101 I can confirm the Organisation seems to exist now, the admin page shows the organisation and I can create new users. Although, I'm quite sure I found a bug unrelated with indexes.

Right now I have organisation "A", this organisation existed one uppon a time, and "user1" was the org-admin of this organisation. Eventually, Cassandra went full, so TheHive crashed, we were able to increase the volume size and recover the application. Although, users were apparently lost...

If I run the query to list users of a given organisation I'm only obtaining "user2", which was the one created after the cassandra crash:

curl -H "Authorization: Bearer APIKEY" -H "Content-Type: application/json" -X POST http://localhost:9000/api/v0/query

Body:

{
  "query": [
    {
      "_name": "getOrganisation",
      "idOrName": "A"
    },
    {
      "_name": "users"
    },
    {
      "_name": "sort",
      "_fields": [
        {
          "login": "asc"
        }
      ]
    },
    {
      "_name": "page",
      "from": 0,
      "to": 15,
      "organisation": "A"
    }
  ]
}

It returns user2 only:

[
    {
        "_id": "~333099454",
        "id": "user2",
        "createdBy": "admin@thehive.local",
        "updatedBy": "admin@thehive.local",
        "createdAt": 1679485199426,
        "updatedAt": 1679485211963,
        "_type": "user",
        "login": "user2",
        "name": "user2",
        "roles": [
            "admin",
            "write",
            "read",
            "alert"
        ],
        "organisation": "A",
        "hasKey": true,
        "status": "Ok"
    }
]

But now, if I try to create the "user1" again....

$ curl -X POST http://localhost:9000/api/v1/user

Body:

{
  "login" : "user1",
  "name" : "user1",
  "organisation": "A",
  "profile": "org-admin",
  "email": "user1",
  "password": "supersecret"
}

It returns 201 with the information of user1 (before the Cassandra crash), but it indicates is assigned to organization "no - org"

{
    "_id": "~24632",
    "_createdBy": "admin@thehive.local",
    "_updatedBy": "admin@thehive.local",
    "_createdAt": 1678881396871, # BEFORE CASSANDRA CRASH
    "_updatedAt": 1679484293863,
    "login": "user1",
    "name": "user1",
    "hasKey": true,
    "hasPassword": true,
    "hasMFA": false,
    "locked": false,
    "profile": "org-admin",
    "permissions": [
        "manageShare",
        "manageAnalyse",
        "manageTask",
        "manageCaseTemplate",
        "manageCase",
        "manageUser",
        "manageProcedure",
        "managePage",
        "manageObservable",
        "manageTag",
        "manageConfig",
        "manageAlert",
        "accessTheHiveFS",
        "manageAction"
    ],
    "organisation": "no org",
    "organisations": [],
    "extraData": {}
}

So, although it returns a 201 status code, the user "user1" is missing from organisation A.

I'm pretty aware this is unrelated to Indexes and ElasticSearch. I'm creating a different issue...

Again, thank you very much for you help!