[Bug] Problem during migration from 3.3.1 to 3.4

dadokkio commented 3 years ago

I was helping trying to migrate production data from thehive 3.3.1 to 3.4.

Initial state of thehive was the following:

Thehive: 3.3.1-1
Elastic4Play: 1.10.0
Play: 2.6.21
Elastic4s: 5.6.6
ElasticSearch: 5.6.9

Status of original elastic index:

>> curl 'localhost:9200/_cat/indices?v'
health   status   index        uuid   pri   rep   docs.count   docs.deleted   store.size   pri.store.size
yellow   open     the_hive_14  xxxx   5     1     49343        141            230.4mb      230.4mb

Following migration guide we modified application.conf changing search.host to the new search.uri. We tried both installing thehive 3.4 from sources and also from .deb package.

At the first run thehive ask to update the db, but after confirming the update it stuck and returns the following error:

unknown4

A new empty index is created and the old one is closed:

>> curl 'localhosto:9200/_cat/indices?v'
health   status   index        uuid   pri   rep   docs.count   docs.deleted   store.size   pri.store.size
         close    the_hive_14
red      open     the_hive_15  yyyy   5     1     0            0              230b          230b

Returning to thehive homepage returns a lot of popup with errors.

What we tried:

Comparing application.conf to the one in deb package to check for differences [only domain and keyStore]
Delete new index, increase connectTimeout => same error
Clean application.conf keeping only minimal conf => same error
Stop the service and try to run it manually: that's funny because it fails with the following error: Since the machine has both java9 and java11 we tried to export also java to be sure java9 is used.
Tried to replicate migration in docker env but with same settings thehive docker image don't start

What we don't tried

Upgrade elastic since thehive 3.3.1 and thehive 3.4 should both support elastic 5

Any help or suggestion?

nadouani commented 3 years ago

The second screenshot is showing No configuration setting found for search.uri. Are you sure the config is there?

nadouani commented 3 years ago

There is also a step by step migration doc: https://github.com/TheHive-Project/TheHiveDocs/blob/master/admin/upgrade_to_thehive_3_4_and_es_6_x.md

Useful options: https://github.com/StrangeBeeCorp/Notebooks

dadokkio commented 3 years ago

Yes, the configuration is there, I've no idea why manually that doesn't work. We tried with different conf files (also one with just search.uri) but we received always error. Steps in the guide has been followed, we just arrive at "Ensure everything is working." but i'ts not.

Some quick questions.. it's ok that the_hive_14 is in close status? If I wan to retry migration it's ok to just remove the_hive_15 and retry from gui? Or we need to bring the_hive_14 back to yellow somehow? It's the size of the index too big? could that be related to timeouterror?

Some ot. the repo has just 3.5 version, so apt upgrade will move to 3.5 skipping 3.4.. we had to find alternative 3.4 deb package

In any case thanks for the notebook! I'll give that a look.

nadouani commented 3 years ago

I don't know what the "closed" status means. I guess you need it open to be able to migrate from it.

It's the size of the index too big? could that be related to timeouterror?

I don't think so

For the packages:

if you need TheHive 3.4.4 you need to fetch them from the stable repository
if you need TheHive 3.5.1 you need to fetch them from the release repository

dadokkio commented 3 years ago

Ok, we reopened the index and delete the the_hive_15 one. Restarting thehive with proper configuration we do have same behavior. After the "Update Database" button "UserMgmtCtrl: java.net.SocketTimeoutException". In this case the old index is still yellow:

>> curl 'localhosto:9200/_cat/indices?v'
health   status   index        uuid   pri   rep   docs.count   docs.deleted   store.size   pri.store.size
yellow   open     the_hive_14  xxxx   5     1     49343        141            230.4mb      230.4mb
red      open     the_hive_15  yyyy   5     1

nadouani commented 3 years ago

Yellow means you have 1 node of ES which is OK.

After the "Update Database" button "UserMgmtCtrl: java.net.SocketTimeoutException".

Ok but what happened during the "Update Database"? any logs? What did timeout? Is it TheHive not able to reach ES? or something else?

LikaSvoykina commented 3 years ago

Yellow means you have 1 node of ES which is OK.

After the "Update Database" button "UserMgmtCtrl: java.net.SocketTimeoutException".

Ok but what happened during the "Update Database"? any logs? What did timeout? Is it TheHive not able to reach ES? or something else?

We have not any logs. We tried to add keepalive = 1m connectTimeout = 50000 But it don't work , the same error. Logs in path /var/log/thehive/application.log didn't update

LikaSvoykina commented 3 years ago

Is it TheHive not able to reach ES I think TheHive is able to reach. how to check it?

dadokkio commented 3 years ago

Some updates:

we double-checked snapshot to be sure we did it properly
we installed 3.4.4 from official repo
we solve two small issues:
- somehow permission of application.log was wrong so we did not have any log anymore
- application.conf under opt have been removed when installed 3.4 from packages. I thought it was not necessary for package installation but without it thehive won't start.
we cleaned the_hive_15 index, make the_hive_14 yellow and start again from scratch
following issue https://github.com/TheHive-Project/TheHive/issues/1092 we tried also downgrading elastic to 6.7.2

Still receiving error on migrate post command:

2021-04-06 11:16:31,818 [INFO] from org.elastic4play.services.MigrationSrv in application-akka.actor.default-dispatcher-18 - Initiate database migration from version 14 (indexWithMappingTypes)
2021-04-06 11:16:31,819 [INFO] from org.elastic4play.services.MigrationSrv in application-akka.actor.default-dispatcher-18 - Migrate database from version 14, add operations for version 15
2021-04-06 11:17:02,018 [ERROR] from org.elastic4play.services.MigrationSrv in application-akka.actor.default-dispatcher-15 - Migration fail
com.sksamuel.elastic4s.http.JavaClientExceptionWrapper: java.net.SocketTimeoutException
        at com.sksamuel.elastic4s.http.ElasticsearchJavaRestClient$$anon$1.onFailure(ElasticsearchJavaRestClient.scala:65)
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:857)
        at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:595)
        at org.elasticsearch.client.RestClient$1.failed(RestClient.java:573)
        at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:138)
        at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: null
        ... 11 common frames omitted
2021-04-06 11:17:02,019 [INFO] from org.elastic4play.ErrorHandler in application-akka.actor.default-dispatcher-15 - POST https://xx.xx.xx.xx:9443/api/maintenance/migrate returned 500
com.sksamuel.elastic4s.http.JavaClientExceptionWrapper: java.net.SocketTimeoutException
        at com.sksamuel.elastic4s.http.ElasticsearchJavaRestClient$$anon$1.onFailure(ElasticsearchJavaRestClient.scala:65)
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:857)
        at org.elasticsearch.client.RestClient$1.retryIfPossible(RestClient.java:595)
        at org.elasticsearch.client.RestClient$1.failed(RestClient.java:573)
        at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:138)
        at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:419)
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketTimeoutException: null
        ... 11 common frames omitted
2021-04-06 11:17:02,022 [WARN] from akka.http.impl.engine.http2.Http2ServerDemux in application-akka.actor.default-dispatcher-2 - handleOutgoingEnded received unexpectedly in state Closed. This indicates a bug in Akka HTTP, please report it to the issue tracker.
2021-04-06 11:17:02,382 [ERROR] from org.elastic4play.database.DBConfiguration in application-akka.actor.default-dispatcher-18 - ElasticSearch request failure: POST:/the_hive_15/_search?
StringEntity({"query":{"match":{"relations":{"query":"user"}}},"size":0},Some(application/json))
 => ElasticError(search_phase_execution_exception,all shards failed,None,None,None,List(),None)

TheHive-Project / TheHive

[Bug] Problem during migration from 3.3.1 to 3.4 #1920

What we tried:

What we don't tried