RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.56k stars 10.58k forks source link

High CPU Node 100% RC Version 6.2.2 #29515

Open nileshA-addweb opened 1 year ago

nileshA-addweb commented 1 year ago

Description:

Rocketchat automatically stop working and node reach to 100% CPU due to which Mongo and Node stop processing further requests and making down RC. It needs to restart containers to bring RC up and running

Steps to reproduce OR Actual behavior:

If the rocket stops working, and the node is at 100% then need to do docker-compose down and docker-compose up to bring RC with normal status

Expected behavior:

rocket stops working

Server Setup Information:

Version of Rocket.Chat Server: 6.2.2 Operating System: Ubuntu 20.04.6 LTS Deployment Method: Docker Number of Running Instances: 1 DB Replicaset Oplog: Yes NodeJS Version: 14.21.2 - x64 MongoDB Version: 5.0.15 MongoDB Engine: wiredTiger USE_NATIVE_OPLOG=true

Client Setup Information

Desktop App: App 3.8.13 Operating System: Ubuntu 22.04.2 LTS

Relevant logs:

{"t":{"$date":"2023-06-12T08:18:56.880+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn17","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_message","command":{"aggregate":"rocketchat_message","pipeline":[{"$match":{"t":"omnichannel_placed_chat_on_hold"}},{"$group":{"_id":"$rid"}},{"$group":{"_id":null,"total":{"$sum":1}}}],"cursor":{},"lsid":{"id":{"$uuid":"e57cbd99-4ba4-4b93-8462-9dad3c7055fc"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557927,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1499728,"cursorExhausted":true,"numYields":1599,"nreturned":0,"queryHash":"48B4E645","planCacheKey":"2C688F3D","reslen":243,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1601}},"Global":{"acquireCount":{"r":1601}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":117572249,"timeReadingMicros":562075}},"remote":"172.26.0.3:36464","protocol":"op_msg","durationMillis":6921}}

{"t":{"$date":"2023-06-12T08:18:56.693+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn23","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_message","command":{"aggregate":"rocketchat_message","pipeline":[{"$match":{"t":"voip-call-on-hold"}},{"$group":{"_id":"$rid"}},{"$group":{"_id":null,"total":{"$sum":1}}}],"cursor":{},"lsid":{"id":{"$uuid":"eaee148b-16ad-405a-ab5e-b4d25750a679"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557927,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1499728,"cursorExhausted":true,"numYields":1592,"nreturned":0,"queryHash":"48B4E645","planCacheKey":"2C688F3D","reslen":243,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":1594}},"Global":{"acquireCount":{"r":1594}},"Mutex":{"acquireCount":{"r":2}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":90880247,"timeReadingMicros":490903}},"remote":"172.26.0.3:56300","protocol":"op_msg","durationMillis":6728}}

{"t":{"$date":"2023-06-12T08:18:51.412+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn18","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_uploads","command":{"aggregate":"rocketchat_uploads","pipeline":[{"$group":{"_id":"total","total":{"$sum":"$size"}}}],"cursor":{},"lsid":{"id":{"$uuid":"6869a1c6-8c34-4935-a74d-f7227462cc20"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557930,"i":5}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat","$readPreference":{"mode":"secondaryPreferred"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":46476,"cursorExhausted":true,"numYields":59,"nreturned":1,"reslen":281,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"r":62}},"Global":{"acquireCount":{"r":62}},"Mutex":{"acquireCount":{"r":3}}},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":31654553,"timeReadingMicros":398635}},"remote":"172.26.0.3:36480","protocol":"op_msg","durationMillis":901}}

{"t":{"$date":"2023-06-12T08:18:50.984+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn28","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.rocketchat_sessions","command":{"insert":"rocketchat_sessions","documents":[{"_id":"6486d4ea14052d142da7f2b3","type":"session","sessionId":"YseHbbzcRHGXG33zq","instanceId":"325c183f-ac7f-4373-8ee7-bf55544f177a","loginToken":"1RgtANdc+7ZFd6QV6VLaFl6KFBwq7HydQlhaXrJcC1M=","ip":"223.177.186.63","host":"chat.addwebsolution.in","device":{"type":"desktop-app","name":"Rocket.Chat","longVersion":"3.9.3","os":{"name":"Windows","version":"10"},"version":"3.9.3"},"userId":"AGuLQYrYHQKB7E52q","roles":["user","HR"],"mostImportantRole":"custom-role","loginAt":{"$date":"2023-06-12T08:18:50.779Z"},"day":12,"month":6,"year":2023,"searchTerm":"Rocket.Chatdesktop-appWindowsYseHbbzcRHGXG33zqAGuLQYrYHQKB7E52q","createdAt":{"$date":"2023-06-12T08:18:50.810Z"},"_updatedAt":{"$date":"2023-06-12T08:18:50.811Z"}}],"ordered":true,"lsid":{"id":{"$uuid":"834cfe99-4a48-4d8d-9cc0-c2dc07a29196"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1686557930,"i":6}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"rocketchat"},"ninserted":1,"keysInserted":15,"numYields":0,"reslen":230,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"FeatureCompatibilityVersion":{"acquireCount":{"r":1,"w":1}},"ReplicationStateTransition":{"acquireCount":{"w":2}},"Global":{"acquireCount":{"r":1,"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":2},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{"data":{"bytesRead":176417,"timeReadingMicros":593},"timeWaitingMicros":{"schemaLock":17497}},"remote":"172.26.0.3:56356","protocol":"op_msg","durationMillis":163}}

Gummikavalier commented 1 year ago

If you have OTR enabled and people use it, this bug could be the cause: https://github.com/RocketChat/Rocket.Chat/issues/28918

nileshA-addweb commented 1 year ago

We are still facing this issue, I have shared mongo logs to see if any findings are there to implement any working solution to overcome this daily issue.

nileshA-addweb commented 1 year ago

Can anyone help here to overcome this issue which is happening anytime on random basis?

nileshA-addweb commented 1 year ago

We are using below RC version with Docker Image and getting constant HIGH CPU usage by NODE due to which RC become unavailable and there are no such slow queries or other logs in MongoDB which cause this issue. Can anyone help look into this issue and provide solution.

Rocket.Chat Version: 6.2.9
NodeJS Version: 14.21.3 - x64
MongoDB Version: 5.0.18
MongoDB Engine: wiredTiger
Platform: linux
Process Port: 3000
Site URL:
ReplicaSet OpLog: Enabled
Commit Hash: abf746733b
Commit Branch: HEAD
shiryov commented 1 year ago

Still issue in 6.2.10, 6.3.0 We do not use omnichannel, it is disabled in the settings. image But every time at the start of each instance, this really long query is launched. There are 50M messages in our database, the query takes about a minute. Also, the query is launched during normal operation during the day and reduces server performance. slow_otr_op.txt

shiryov commented 1 year ago

@nileshA-addweb this helps:

use <db_name>
db.rocketchat_message.createIndex({ t: 1 }, { sparse: true })
nileshA-addweb commented 1 year ago

@shiryov Will this affect the notifications which are appearing while someone tag us in particular channel in RC? If there is no impact on notifications and other functionalities then we can check once.