OpenCTI-Platform / opencti

Open Cyber Threat Intelligence Platform
https://opencti.io
Other
5.13k stars 812 forks source link

[6.1.X] Platform Crashing - JavaScript heap out of memory #7436

Open MaxwellDPS opened 1 week ago

MaxwellDPS commented 1 week ago

Description

JavaScript heap out of memory OOMs plaguing uptime and usability of 6.1.X - Seems linked to import of data

Environment

  1. OS (where OpenCTI server runs): CentOS Stream 9 - Kubernetes
  2. OpenCTI version: 6.1.6
  3. OpenCTI client: Frontend
  4. Other environment details: scaled, clustered deployment

Reproducible Steps

Steps to create the smallest reproducible scenario:

None known - Has been continuous since the 6.1.X upgrade Seems to be far worse when data is being imported.

RabbitMQ is showing a backlog of ~1k messages in the push_sync queue that are going nowhere. (Connector queues are dropping)

Redis memory is spiking at the times this is happening

Expected Output

no platform crashes

NAME                                                              READY   STATUS      RESTARTS        AGE
opencti-opencti-api-f84c7f588-94rsq                               2/2     Running     0               15m
opencti-opencti-api-f84c7f588-hq5zx                               2/2     Running     0               15m
opencti-opencti-api-f84c7f588-lf25m                               2/2     Running     0               15m
opencti-opencti-api-f84c7f588-scb4v                               2/2     Running     0               15m
opencti-opencti-api-f84c7f588-zxgbq                               2/2     Running     0               15m
...
opencti-opencti-web-6d5656fc4f-7jg9p                              2/2     Running     0               15m
opencti-opencti-web-6d5656fc4f-7lng5                              2/2     Running     0               15m
opencti-opencti-web-6d5656fc4f-fvllc                              2/2     Running     0               15m
opencti-opencti-web-6d5656fc4f-qjsgr                              2/2     Running     0               15m

Actual Output

NAME                                                              READY   STATUS      RESTARTS        AGE
opencti-opencti-api-f84c7f588-94rsq                               2/2     Running     2 (5m20s ago)   15m
opencti-opencti-api-f84c7f588-hq5zx                               2/2     Running     2 (6m25s ago)   15m
opencti-opencti-api-f84c7f588-lf25m                               2/2     Running     1 (118s ago)    15m
opencti-opencti-api-f84c7f588-scb4v                               2/2     Running     0               15m
opencti-opencti-api-f84c7f588-zxgbq                               2/2     Running     1 (4m26s ago)   15m
...
opencti-opencti-web-6d5656fc4f-7jg9p                              2/2     Running     2 (7m23s ago)   14m
opencti-opencti-web-6d5656fc4f-7lng5                              2/2     Running     1 (3m2s ago)    15m
opencti-opencti-web-6d5656fc4f-fvllc                              2/2     Running     0               7m56s
opencti-opencti-web-6d5656fc4f-qjsgr                              2/2     Running     1 (66s ago)     15m

Additional information

Logs start up to crash

{"category":"APP","environment":"production","level":"info","message":"[OPENCTI] Starting platform","source":"backend","timestamp":"2024-06-20T20:17:33.483Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI] Checking dependencies statuses","source":"backend","timestamp":"2024-06-20T20:17:33.485Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[SEARCH] Engine client not specified, trying to discover it with opensearch client","source":"backend","timestamp":"2024-06-20T20:17:33.494Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[SEARCH] Engine detected to elk","source":"backend","timestamp":"2024-06-20T20:17:33.568Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[SEARCH] elk (8.13.4) client selected / runtime sorting enabled / attachment processor enabled","source":"backend","timestamp":"2024-06-20T20:17:33.604Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[CHECK] Search engine is alive","source":"backend","timestamp":"2024-06-20T20:17:33.604Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[CHECK] File engine is alive","source":"backend","timestamp":"2024-06-20T20:17:33.637Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[CHECK] RabbitMQ engine is alive","source":"backend","timestamp":"2024-06-20T20:17:33.704Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[REDIS] Redis 'base' client ready","source":"backend","timestamp":"2024-06-20T20:17:33.732Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[REDIS] Clients initialized in single mode","source":"backend","timestamp":"2024-06-20T20:17:33.733Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[CHECK] Redis engine is alive","source":"backend","timestamp":"2024-06-20T20:17:33.733Z","version":"6.1.6"}
{"category":"APP","level":"warn","message":"SMTP seems down, email notification will may not work","source":"backend","timestamp":"2024-06-20T20:17:38.816Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[CHECK] Python3 is available","source":"backend","timestamp":"2024-06-20T20:17:38.846Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[REDIS] Redis 'subscriber' client ready","source":"backend","timestamp":"2024-06-20T20:17:38.854Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Cache manager pub sub listener initialized","source":"backend","timestamp":"2024-06-20T20:17:38.855Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[REDIS] Redis 'lock' client ready","source":"backend","timestamp":"2024-06-20T20:17:38.864Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[INIT] Starting platform initialization","source":"backend","timestamp":"2024-06-20T20:17:39.833Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[INIT] Existing platform detected, initialization...","source":"backend","timestamp":"2024-06-20T20:17:39.883Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[INIT] admin user initialized","source":"backend","timestamp":"2024-06-20T20:17:44.147Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[MIGRATION] Read 15 migrations from the database","source":"backend","timestamp":"2024-06-20T20:17:44.215Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[MIGRATION] Platform already up to date, nothing to migrate","source":"backend","timestamp":"2024-06-20T20:17:44.219Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[MIGRATION] Migration process completed","source":"backend","timestamp":"2024-06-20T20:17:44.219Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[MIGRATION] Platform version updated to 6.1.6","source":"backend","timestamp":"2024-06-20T20:17:44.254Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[INIT] Platform initialization done","source":"backend","timestamp":"2024-06-20T20:17:44.302Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI] API ready on port 8080","source":"backend","timestamp":"2024-06-20T20:17:45.398Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Expiration manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.398Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Connector manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.398Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Starting Import Csv built in connector manager","source":"backend","timestamp":"2024-06-20T20:17:45.398Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Retention manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Task manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Rule engine not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Sync manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Ingestion manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] History manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Notification manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Publisher manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Playbook manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Starting file index manager","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Indicator decay manager not started (disabled by configuration)","source":"backend","timestamp":"2024-06-20T20:17:45.473Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Starting Garbage collection manager","source":"backend","timestamp":"2024-06-20T20:17:45.474Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Starting Telemetry manager","source":"backend","timestamp":"2024-06-20T20:17:45.474Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Starting cluster manager","source":"backend","timestamp":"2024-06-20T20:17:45.474Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Support Package pub sub listener initialized","source":"backend","timestamp":"2024-06-20T20:17:45.491Z","version":"6.1.6"}
{"category":"APP","connectorId":"d336676c-4ee5-4257-96ff-b2a86688d4af","level":"info","message":"[QUEUEING] Starting connector queue consuming","source":"backend","timestamp":"2024-06-20T20:17:45.529Z","version":"6.1.6"}
{"category":"APP","errors":[{"attributes":{"genre":"TECHNICAL","http_status":500,"referer":"https://<rem>/dashboard/threats/intrusion_sets/610ea783-fafb-49a3-b41f-3ab14d11cd35"},"message":"Http call interceptor fail","name":"UNKNOWN_ERROR","stack":"UNKNOWN_ERROR: Http call interceptor fail\n    at error (/opt/opencti/build/src/config/errors.js:8:10)\n    at UnknownError (/opt/opencti/build/src/config/errors.js:82:47)\n    at fn (/opt/opencti/build/src/http/httpPlatform.js:455:18)\n    at lle.handle_error (/opt/opencti/build/node_modules/express/lib/router/layer.js:71:5)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:326:13)\n    at done (/opt/opencti/build/node_modules/express/lib/router/index.js:286:9)\n    at Function.process_params (/opt/opencti/build/node_modules/express/lib/router/index.js:346:12)\n    at next (/opt/opencti/build/node_modules/express/lib/router/index.js:280:10)\n    at lle.handle_error (/opt/opencti/build/node_modules/express/lib/router/layer.js:67:12)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:326:13)\n    at done (/opt/opencti/build/node_modules/express/lib/router/index.js:286:9)\n    at Function.process_params (/opt/opencti/build/node_modules/express/lib/router/index.js:346:12)\n    at next (/opt/opencti/build/node_modules/express/lib/router/index.js:280:10)\n    at lle.handle_error (/opt/opencti/build/node_modules/express/lib/router/layer.js:67:12)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:326:13)\n    at done (/opt/opencti/build/node_modules/express/lib/router/index.js:286:9)\n    at Function.process_params (/opt/opencti/build/node_modules/express/lib/router/index.js:346:12)\n    at next (/opt/opencti/build/node_modules/express/lib/router/index.js:280:10)\n    at lle.handle_error (/opt/opencti/build/node_modules/express/lib/router/layer.js:67:12)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:326:13)"},{"message":"stream is not readable","name":"InternalServerError","stack":"InternalServerError: stream is not readable\n    at readStream (/opt/opencti/build/node_modules/raw-body/index.js:185:17)\n    at getBody (/opt/opencti/build/node_modules/raw-body/index.js:116:12)\n    at read (/opt/opencti/build/node_modules/body-parser/lib/read.js:79:3)\n    at fn (/opt/opencti/build/node_modules/body-parser/lib/types/json.js:138:5)\n    at lle.handle [as handle_request] (/opt/opencti/build/node_modules/express/lib/router/layer.js:95:5)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:328:13)\n    at done (/opt/opencti/build/node_modules/express/lib/router/index.js:286:9)\n    at Function.process_params (/opt/opencti/build/node_modules/express/lib/router/index.js:346:12)\n    at next (/opt/opencti/build/node_modules/express/lib/router/index.js:280:10)\n    at cors (/opt/opencti/build/node_modules/cors/lib/index.js:188:7)\n    at cb (/opt/opencti/build/node_modules/cors/lib/index.js:224:17)\n    at originCallback (/opt/opencti/build/node_modules/cors/lib/index.js:214:15)\n    at cb (/opt/opencti/build/node_modules/cors/lib/index.js:219:13)\n    at optionsCallback (/opt/opencti/build/node_modules/cors/lib/index.js:199:9)\n    at fn (/opt/opencti/build/node_modules/cors/lib/index.js:204:7)\n    at lle.handle [as handle_request] (/opt/opencti/build/node_modules/express/lib/router/layer.js:95:5)\n    at trim_prefix (/opt/opencti/build/node_modules/express/lib/router/index.js:328:13)\n    at done (/opt/opencti/build/node_modules/express/lib/router/index.js:286:9)\n    at Function.process_params (/opt/opencti/build/node_modules/express/lib/router/index.js:346:12)\n    at next (/opt/opencti/build/node_modules/express/lib/router/index.js:280:10)\n    at Function.handle (/opt/opencti/build/node_modules/express/lib/router/index.js:175:3)\n    at router (/opt/opencti/build/node_modules/express/lib/router/index.js:47:12)"}],"level":"error","message":"Http call interceptor fail","source":"backend","timestamp":"2024-06-20T20:20:27.622Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[REDIS] Redis 'publisher' client ready","source":"backend","timestamp":"2024-06-20T20:22:16.517Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[FILE STORAGE] delete file import/global/PRC Malware Infrastructure 1-21 March.pdf in index","source":"backend","timestamp":"2024-06-20T20:23:25.040Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[TELEMETRY] File exporter activated","source":"backend","timestamp":"2024-06-20T20:28:46.210Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[TELEMETRY] Otlp exporter activated","source":"backend","timestamp":"2024-06-20T20:28:47.031Z","version":"6.1.6"}
{"category":"APP","level":"info","message":"[OPENCTI-MODULE] Running Telemetry manager infinite cron handler","source":"backend","timestamp":"2024-06-20T20:28:47.049Z","version":"6.1.6"}

<--- Last few GCs --->

[7:0x7f21a05eb690]   744891 ms: Mark-Compact 8445.5 (8557.9) -> 8445.4 (8557.1) MB, 3235.78 / 0.00 ms  (average mu = 0.465, current mu = 0.093) allocation failure; scavenge might not succeed
[7:0x7f21a05eb690]   749751 ms: Mark-Compact 8459.8 (8558.8) -> 8459.8 (8587.6) MB, 4840.97 / 0.00 ms  (average mu = 0.242, current mu = 0.004) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
MaxwellDPS commented 1 week ago

Redis (-6h)

image

RabbitMQ (-6h)

image

richard-julien commented 1 week ago

Do you have any monitoring dashboard on the nodejs process? Memory, CPU, Event loop lag, active handles ? Thanks

richard-julien commented 1 week ago

For example, its the profile ourof an internal instance with a very large dataset / number of connectors running.

image

image

MaxwellDPS commented 6 days ago

Hey @richard-julien, I do not have this monitoring for node. I have the Prometheus metrics on the API and workers and that is it

MaxwellDPS commented 6 days ago

NVM looks like I have them, can I get the JSON for that dashboard

MaxwellDPS commented 6 days ago

Hey Julien here is -3h from now of node metrics.

New symptoms

Containers:
  opencti-web:
    Ports:          8080/TCP, 14269/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 24 Jun 2024 09:15:06 -0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    134
      Started:      Mon, 24 Jun 2024 07:59:41 -0700
      Finished:     Mon, 24 Jun 2024 09:15:05 -0700

image

richard-julien commented 6 days ago

Thanks for the dashboard. Looks like there is some CPU > 100% spike along with memory spike > 2 GB. This kind of behavior can really create various type of problems. We work a lot lately to fix this kind of situation but looks like we still have some work to do. Could be helpful to have an idea of the kind of data ingested during this spike, like report with millions of object_refs?

For the sync queue if the queue is not processing, there is maybe an error that prevent the last message to be processed and so you continously try to ingest the same message. If you can pause your connectors waiting to have only this queue alive to isolate the errors, this could be helpful.

MaxwellDPS commented 6 days ago

So on the import side, this is happeningwhen the platform is idle (not just during import). There is no connector that triggers it, just opening the WebUI can trigger it. (web pods arent even processing this data, all connectors and workers point to another set of pods)

We also dont have any reports that have >500k Refs - There is no activity occurring during these events, I can have a queue of 0 and it still happens. It also happens so often it is inhibiting our ability to use the platform

For the sync queue, there are 1k messages and they are not climbing, seems to not be a connector (or must be TAXII)

richard-julien commented 6 days ago

Can you isolate a screen that generate this situation? And check the network of the browser to identify the different calls done by the screen? Then try to isolate the query responsible for the situation? thanks

MaxwellDPS commented 6 days ago

Julien its happening when nothing is being opened. Im not sure how else to word this - Its crashing at a dead idle.

Last crash was ~40 min ago. Wasnt doing anything on it, no connectors, no workers - It just died.

Only activity would be for a single TAXII feed poll with ~4k SCO's at a 10 min interval

richard-julien commented 6 days ago

Maybe it comes from a manager. You can try to disable all the managers to try to isolate the one that produce the CPU / memory spike by reactivating it one by one.

MaxwellDPS commented 6 days ago

All managers are disabled, this is the current config

  CONNECTOR_MANAGER__ENABLED: "false"
  EXPIRATION_SCHEDULER__ENABLED: "false"
  HISTORY_MANAGER__ENABLED: "false"
  IMPORT_CSV_CONNECTOR__ENABLED: "true"
  IMPORT_CSV_CONNECTOR__VALIDATE_BEFORE_IMPORT: "true"
  INDICATOR_DECAY_MANAGER__ENABLED: "false"
  INGESTION_MANAGER__ENABLED: "false"
  ACTIVITY_MANAGER__ENABLED: "false"
  NOTIFICATION_MANAGER__ENABLED: "false"
  PLAYBOOK_MANAGER__ENABLED: "false"
  PUBLISHER_MANAGER__ENABLED: "false"
  RETENTION_MANAGER__ENABLED: "false"
  RULE_ENGINE__ENABLED: "false"
  SYNC_MANAGER__ENABLED: "false"
  TASK_SCHEDULER__ENABLED: "false"

It just died.

Here is current resource useage, looked comparable at the time. (opencti-opencti-web are the ones doing nothing)

NAME                                                              CPU(cores)   MEMORY(bytes)   
opencti-elastic-es-leaders-0                                      799m         3621Mi          
opencti-elastic-es-leaders-1                                      146m         3652Mi          
opencti-elastic-es-leaders-2                                      414m         3805Mi          
opencti-elastic-es-data-0                                         6885m        13868Mi         
opencti-elastic-es-data-1                                         6308m        13807Mi         
opencti-elastic-es-data-2                                         6545m        13699Mi         
opencti-minio-5f64757877-dlzgn                                    2m           516Mi           
opencti-opencti-api-7fbbc6b4cd-2lkgg                              133m         561Mi           
opencti-opencti-api-7fbbc6b4cd-fqbqf                              11m          481Mi           
opencti-opencti-api-7fbbc6b4cd-trdtv                              563m         1265Mi          
...          
opencti-opencti-web-8559898bf5-8b6g4                              398m         736Mi           
opencti-opencti-web-8559898bf5-b9mqz                              442m         757Mi           
opencti-opencti-web-8559898bf5-ct2d7                              348m         758Mi           
opencti-opencti-worker-c9b55c8df-dcrx8                            9m           49Mi            
opencti-opencti-worker-c9b55c8df-ntbsl                            13m          49Mi            
opencti-opencti-worker-c9b55c8df-qtgp2                            11m          50Mi            
opencti-opencti-worker-c9b55c8df-srfms                            7m           49Mi            
opencti-opencti-worker-c9b55c8df-vr8k7                            12m          49Mi            
opencti-rabbitmq-server-0                                         9m           554Mi           
opencti-rabbitmq-server-1                                         13m          433Mi           
opencti-rabbitmq-server-2                                         19m          508Mi           
opencti-redis-master-0                                            56m          3637Mi  
richard-julien commented 6 days ago

Can you put the log level to DEBUG on the instance that should purely idle and send it to me? thanks

MaxwellDPS commented 6 days ago

Will do, just changed level, I'll email logs when they are available

MaxwellDPS commented 6 days ago

Current status of the push_sync

Virtual host Name Node Type Features Consumers Consumer capacity State Ready Unacked In Memory Persistent Total Ready Persistent Total incoming deliver / get redelivered ack
/ push_sync rabbit@opencti-rabbitmq-server-1.opencti-rabbitmq-nodes +2 quorum D Args 5 100% running 2,025 5 0 2,030 2,030 1.5 GiB 1.5 GiB 1.5 GiB 0.00/s 0.00/s 0.00/s 0.00/s
MaxwellDPS commented 6 days ago

Not sure its related, but seeing a "socket hangup" on RMQ

{
    "category": "APP",
    "errors": [
        {
            "attributes": {
                "genre": "TECHNICAL",
                "http_status": 500
            },
            "message": "socket hang up",
            "name": "UNKNOWN_ERROR",
            "stack": "UNKNOWN_ERROR: socket hang up\n    at error (/opt/opencti/build/src/config/errors.js:8:10)\n    at UnknownError (/opt/opencti/build/src/config/errors.js:82:47)\n    at Object._logWithError (/opt/opencti/build/src/config/conf.js:235:17)\n    at Object.error (/opt/opencti/build/src/config/conf.js:244:48)\n    at Object.willSendResponse (/opt/opencti/build/src/graphql/loggerPlugin.js:153:20)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Promise.all (index 1)\n    at b (/opt/opencti/build/node_modules/apollo-server-core/src/requestPipeline.ts:530:5)\n    at processHTTPRequest (/opt/opencti/build/node_modules/apollo-server-core/src/runHttpQuery.ts:437:24)"
        },
        {
            "message": "socket hang up",
            "name": "Error",
            "stack": "Error: socket hang up\n    at Function.Pce.from (/opt/opencti/build/node_modules/axios/lib/core/AxiosError.js:89:14)\n    at dx.handleRequestError (/opt/opencti/build/node_modules/axios/lib/adapters/http.js:610:25)\n    at dx.emit (node:events:519:28)\n    at ClientRequest.lyn.<computed> (/opt/opencti/build/node_modules/follow-redirects/index.js:38:24)\n    at ClientRequest.emit (node:events:519:28)\n    at Socket.socketOnEnd (node:_http_client:524:9)\n    at Socket.emit (node:events:531:35)\n    at endReadableNT (node:internal/streams/readable:1696:12)\n    at processTicksAndRejections (node:internal/process/task_queues:82:21)\n    at xyn.request (/opt/opencti/build/node_modules/axios/lib/core/Axios.js:45:41)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at metricApi (/opt/opencti/build/src/database/rabbitmq.js:116:22)\n    at getMetrics (/opt/opencti/build/src/domain/rabbitmqMetrics.js:7:17)"
        }
    ],
    "inner_relation_creation": 0,
    "level": "error",
    "message": "socket hang up",
    "operation": "Unspecified",
    "query_attributes": [
        [
            {
                "arguments": [],
                "name": "rabbitMQMetrics"
            }
        ]
    ],
    "size": 2,
    "source": "backend",
    "time": 47,
    "timestamp": "2024-06-24T20:38:26.295Z",
    "type": "READ_ERROR",
    "user": {
        "group_ids": [
            UUID,
            UUID
        ],
        "ip": "192.168.6.164",
        "organization_ids": [
            "828b5f70-eda2-4eb6-9879-ed5d0b5afe42"
        ],
        "referer": "https://<REM>/dashboard/data/ingestion/connectors",
        "socket": "query",
        "user_id": UUID,
        "user_metadata": {}
    },
    "version": "6.1.6"
}
MaxwellDPS commented 6 days ago

So testing, the CSV import is hosed, it will not do anything. All messages in the push_sync queue are stalled, none are moving even after a full clear and restart

MaxwellDPS commented 6 days ago

Is there a speific log you are looking for here @richard-julien

richard-julien commented 6 days ago

Im looking of what could be the activity on this instance that should idle. (kind of queries to find the service that generate the activity)

MaxwellDPS commented 5 days ago

Yeah so, only thing running would be a query to a TAXII feed every 10 min. The other thing Im noticing is CSV import comes and goes, it doesnt seem to be running reliably.

Also looks like none of the consumers on push_sync are acking messages