RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.42k stars 10.52k forks source link

Users often disconnect and need to login again #23947

Open sthaydn opened 2 years ago

sthaydn commented 2 years ago

Hi RocketChat,

thanks for this awesome software. I am behind a reverse proxy and users regularly get disconnected and need to login again and have to insert the E2E password every time. I guess it is because of the reverse proxy, as I haven't seen something similiar in the issues here.

What could be the cause of this disconnections? What settings could I look for? I am on the latest release (but have this since the very beginning), the reverse proxy is nginx.

Regards,

Stefan

sgohl commented 2 years ago

I experience this rarely too, with Traefik. But mostly (almost only, I would say) when I update Rocketchat Server. It feels that it's rather related to the client app (desktop and web), where some kind of connect-timeout exception happens. I dont have any prove on that, it just feels like this.

If you'd post your relevant nginx config, chances are we could identify the reason, if any, or at least ensure it's not the reason. Are you running multple instances of Rocketchat app? Are you running a mongodb multi node replicaset?

sthaydn commented 2 years ago

Thanks for your reply.

This is the nginx conf: `server { listen 80; server_name rocket.chat.com;

root /var/www;

    location ^~ /.well-known/acme-challenge {
            default_type text/plain;
            root /var/www/letsencrypt;
    }

location / {
return 301 https://$host$request_uri;
}

}

server { listen 443 ssl; server_name rocket.chat.com;

add_header Strict-Transport-Security "max-age=31536000; includeSubdomains";
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Permissions-Policy "interest-cohort=()";

ssl_certificate /etc/letsencrypt/rocket.chat.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/rocket.chat.com/key.pem;   

include /etc/nginx/snippets/ssl.conf;

client_max_body_size 10M;

    access_log              /var/log/nginx/hopfagartn.access.log matomo;

location / {
    proxy_pass      http://in.ter.nal.ip:3000;
    proxy_http_version  1.1;
    proxy_set_header    Upgrade $http_upgrade;
    proxy_set_header    Connection "upgrade";
    proxy_set_header    Host $http_host;
    proxy_set_header    X-Real-IP $remote_addr;
    proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header    X-Forwarded-Proto https;
    proxy_set_header    X-Nginx-Proxy true;
    proxy_redirect      off;
}

}`

What are multiple instances of the app? On each device there is only one account on the app.

Mongo is the standard installation.

sthaydn commented 2 years ago

I see I could leave out this line proxy_set_header X-Nginx-Proxy true; but that shouldn't be the culprit.

sthaydn commented 2 years ago

This are the ssl snippets settings. Maybe the last two could be the cause? Maybe the curves?

`ssl_dhparam /etc/nginx/dhparams/dhparams.pem;

ssl_protocols TLSv1.2 TLSv1.3;

ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';

ssl_ecdh_curve secp521r1:secp384r1:prime256v1;

ssl_prefer_server_ciphers off;

ssl_stapling on; ssl_stapling_verify on;

resolver 46.xx.xx.xx;

ssl_session_timeout 24h; ssl_session_tickets off;`

sthaydn commented 2 years ago

I have tweaked some bits from above. Let's see if these settings are better.

sthaydn commented 2 years ago

Ok, this settings weren't as good as I thought. I have no idea why useres got disconnected that often. Is there a timelimit, when the app is one day closed, for example, that one gets disconnected after a while?

andrewlorenz commented 1 year ago

We have similar, apparently random disconnects occurring, using a (customised) version of RC server and a (customised) version of RC Native Client. However the issue is most definitely in the lower-level handshaking layers of meteor, rather than RC (and definitely nothing to do with our customisations, as we only make use of extra methods and REST endpoints). I appreciate that as soon as any RC peeps see the word "customised" they will run a mile! I will dig out some diagnostics later though and post what I can on here, in case it helps. But the main call-out is that I am pretty certain that this is NOT a RC issue, but a meteor-related issue.

andrewlorenz commented 1 year ago

ok here goes, I've dug up my notes on this.

The login appears to fail because meteor rejects the resume token - its own token, that it issues as part of its accounts package.

10-09 04:49:44: login: failed errorClass [Error]: You've been logged out by the server. Please log in again. [403]
10-09 04:49:44:     at MethodInvocation.defaultResumeLoginHandler (packages/accounts-base/accounts_server.js:1561:14)
10-09 04:49:44:     at MethodInvocation.<anonymous> (packages/accounts-base/accounts_server.js:1523:38)
10-09 04:49:44:     at packages/accounts-base/accounts_server.js:594:31
10-09 04:49:44:     at tryLoginMethod (packages/accounts-base/accounts_server.js:1509:14)

And going to the code that's identified in the error, you'll find:

if (! user) {
    // If we didn't find the hashed login token, try also looking for
    // the old-style unhashed token.  But we need to look for either
    // the old-style token OR the new-style token, because another
    // client connection logging in simultaneously might have already
    // converted the token.
    user = accounts.users.findOne({
        $or: [
          {"services.resume.loginTokens.hashedToken": hashedToken},
          {"services.resume.loginTokens.token": options.resume}
        ]
      },
      // Note: Cannot use ...loginTokens.$ positional operator with $or query.
      {fields: {"services.resume.loginTokens": 1}});
  }

  if (! user)
    return {
      error: new Meteor.Error(403, "You've been logged out by the server. Please log in again.")
  };

so there's the 403 being issued. So what's the problem? Well I can't fathom it out ! At the time of the call above, my console debug confirmed

10-09 04:49:44: login: verify user args: { resume: 'CR4YrIQ7feTJPhnV-kLqS78it72RhzLgdLHaXkAaSs7' }

and the user record was .. well there's no point showing it as you can't just see the resume token in there because its hashed ! But the way the code errored, it evidently couldn't match up the resume token with any of the tokens in the user record, of whatever nature "old-style" or "new-style" as the comments indicate in the meteor code.

Of course, once the 403 is thrown, its curtains for the client.

We are suffering from this issue multiple times a day, across users, every day of the week. Its entirely sporadic, it works most of the time, so is the worst kind of issue. But its also extremely annoying to the user community.

andrewlorenz commented 1 year ago

another update - had a dig on meteor's issues and found there have historically been issues with the "loginExpirationInDays" setting which controls when a token expires. Issues as in somewhat strange logic as to how its actually implemented, and that (historically) some people have had weird behaviour when using particular values. So we've just updated ours to 3650 - 10 years - and will restart our RC server when we can next do so.

sthaydn commented 1 year ago

Thank you for digging. I appreciate that.