RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.04k stars 10.33k forks source link

Parsing of special message lead to server hang #23905

Open ankar84 opened 2 years ago

ankar84 commented 2 years ago

Description:

I guess, finally I have a an answer for many mysterious hangs of our production Rocket Chat instances. That issue totally related https://github.com/RocketChat/Rocket.Chat/issues/22886 That comment related I guess. And that comment explains behavior a bit. And that comment nicely demonstrates that issue with some graphics. I believe somewhere on GitHub could be another comments that demonstrate issue described here.

First of all it is not actually a pure Rocket Chat server bug, it mostly configuration issue. But I'm creating that issue to Rocket Chat learn how to handle that situation on their side and for customers who need that settings for some reason.

Well, let's go closer to topic. Sometime in past we increased Maximum Allowed Characters Per Message in Admin UI - Message from default 5000 to 64000. image That especially for devs and admins to easy share code and logs.

Yesterday we again experienced hang of many of our instances. Hang, but not crash! We using docker deployment method, so using docker stats command we can see that hang instances image That hang (stuck) instances were on 101-105% CPU load all the time and not decreased. Here is how look like server with all healthy instances image

Next I start to restart hang instances with docker restart command. Sometimes that helped, but mostly not. Then I checked hang container logs with docker logs command But there was nothing at all! Clean!

Then I turned debug log level in Admin UI - Logs And after restart of problematic (hang) instances I analyzed their logs. And one thing was pretty common for all hang instances. Near the end of log I saw this:

API ➔ info 87.XXX.XXX.66 - atJCb8tsrC5nY4CBD [2021-12-08T16:28:20.399Z] "POST /api/v1/method.call/loadMissedMessages" 200 - "https://rc.company.com/direct/HrHXbKJeyHQ7b5s7uatJCb8tsrC5nY4CBD?jump=hhyZ8EkYM0YU2JuBM" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Rocket.Chat/3.0.6 Chrome/85.0.4183.121 Electron/10.1.3 Safari/537.36" |

Same IP address on many different hang instances, and always that loadMissedMessages REST API endpoint of same user on same DM chat HrHXbKJeyHQ7b5s7uatJCb8tsrC5nY4CBD Using userid (from DM) I get two users and ask they about something weird in their DM. One of them answered that for some reason other user send this strange message. image I found that message in DB and send it on test deployment on 4.2.0 Rocket Chat version, and... it hang the instance! Message about 63.3Kb I did try 32Kb - also hangs, and 16Kb too

So, we come back default Maximum Allowed Characters Per Message in Admin UI - Message to default 5000 to get stable and reliable Rocket Chat deployment.

Steps to reproduce:

  1. Set Maximum Allowed Characters Per Message in Admin UI - Message to 64000
  2. Send a message from the bottom of that issue
  3. Get hang RocketChat server

Expected behavior:

Rocket Chat should easy handle such large text messages. Look around, it's 2021! And one of the popular messaging platform could be destroyed by a 64Kb of text? Really?

Actual behavior:

There is some screenshots in description.

Server Setup Information:

Client Setup Information

Additional context

@sampaiodiego @ggazzo please check that issue and tell me what do you think?

Here is a test message. In fact it's some picture in base64 encoding, I believe that any large message could do pretty same.

```
Gummikavalier commented 2 years ago

We've had the limit at 50 000 characters for years now, and have not had this kind of issues lately. Or not that we'd have spotted them at least.

But we had it with parsing urls in xml content posted in a message few years back (my comment in this issue): https://github.com/RocketChat/Rocket.Chat/issues/10637

So it could be that base64 encoded img data content or image size specifically that makes the parser go mad this time.

ankar84 commented 2 years ago

So it could be that base64 encoded img data content or image size specifically that makes the parser go mad this time.

Very nice clarification, as usual, @Gummikavalier I just tested and can say for sure, that sending 64kb of lorem ipsum text works perfect, not overloading anything. So, it is a parser issue, indeed! Thank you!

We traced it to one specific client computer in the network, which timed out and moved from a node to node, putting them all one by one with 100% CPU load. We prevented that client connecting to the server with iptables reject rule, and before getting better look at it, the problem suddenly vanished.

That exactly what I did yesterday. Hunting that problematic user jumping from instance to instance hanging them in a seconds.

tassoevan commented 2 years ago

Welp... I bet this is the monster.

const urls = message.html.match(/([A-Za-z]{3,9}):\/\/([-;:&=\+\$,\w]+@{1})?([-A-Za-z0-9\.]+)+:?(\d+)?((\/[-\+=!:~%\/\.@\,\w]*)?\??([-\+=&!:;%@\/\.\,\w]+)?(?:#([^\s\)]+))?)?/g) || [];
aswiniip commented 2 years ago

Welp... I bet this is the monster.

const urls = message.html.match(/([A-Za-z]{3,9}):\/\/([-;:&=\+\$,\w]+@{1})?([-A-Za-z0-9\.]+)+:?(\d+)?((\/[-\+=!:~%\/\.@\,\w]*)?\??([-\+=&!:;%@\/\.\,\w]+)?(?:#([^\s\)]+))?)?/g) || [];

I would like to take this issue. Could you please help me with what do I have to do? :thinking: