RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.09k stars 10.35k forks source link

Federation: No messages delivered between servers 3.17.2 #23057

Closed cpergler closed 2 years ago

cpergler commented 3 years ago

Description:

Federation setup between 2 servers using DNS method was sucessful. Users on one server can find users on the other server and vice versa. However, when a message is sent, it is not received by the recipient. I know this has been discusses before, but still does not seem to work.

Steps to reproduce:

  1. Setup Rocketchat on 2 servers on different domains and make them reachable from the internet via https://subdomain.yourdomain.com using your nginx reverse proxy with letsencrypt cert, then create a least 2 Users on each server and test messaging between those internal users using clients connecting from local network and the internet as well.
  2. Setup DNS as discribed in federation setup instructions for both domains
  3. Activate federation and verify setup by clicking "Test setup" in Rocketchat admin console on both servers
  4. Find external users from other domain by searching for their e-mail address in Rocketchat
  5. Send message

Expected behavior:

Message gets delivered

Actual behavior:

Message is not getting deliviered

Server Setup Information:

Client Setup Information

Relevant logs:

Log on sender side after writing a message:

I20210828-11:57:04.367(0) Push ➔ debug send message "c.pergler" to userId mmaHSfujcs95opK5h I20210828-11:57:04.369(0) Push ➔ debug Sent message "c.pergler" to 0 ios apps 0 android apps I20210828-11:57:04.371(0) Push ➔ debug GUIDE: The "appTokensCollection" is empty - No clients have registered on the server yet... I20210828-11:57:07.656(0) API ➔ debug POST: /api/v1/method.call/license%3AgetTags I20210828-11:57:07.659(0) API ➔ debug Success { statusCode: 200, body: { message: '{"msg":"result","id":"256","result":[]}', success: true } } I20210828-11:57:07.664(0) API ➔ debug GET: /api/v1/licenses.get I20210828-11:57:07.667(0) API ➔ debug POST: /api/v1/method.call/instances%3Aget I20210828-11:57:07.668(0) Meteor ➔ method instances/get -> userId: KMrDkaydB2cagQKbM, arguments: [] I20210828-11:57:07.669(0) API ➔ debug GET: /api/v1/statistics?refresh=false I20210828-11:57:07.673(0) API ➔ debug Success { statusCode: 200, body: { message: '{"msg":"result","id":"257","result":[]}', success: true } } I20210828-11:57:07.674(0) API ➔ debug Success { statusCode: 200, body: { licenses: [], success: true } } I20210828-11:57:07.677(0) API ➔ debug Success { statusCode: 200, body: { _id: '4nXip2mmWzgJjtYAK', wizard: { organizationType: 'enterprise', industry: 'technologyServices', size: '0', country: 'austria', language: 'de', serverType: 'privateTeam', registerServer: true }, uniqueId: 'gxLEgfza78BmcpEH4', installedAt: 2021-08-28T08:45:59.255Z, version: '3.17.2', totalUsers: 3, activeUsers: 2, activeGuests: 0, nonActiveUsers: 0, appUsers: 0, onlineUsers: 3, awayUsers: 0, busyUsers: 0, totalConnectedUsers: 3, offlineUsers: 0, userLanguages: { none: 3 }, totalRooms: 3, totalChannels: 1, totalPrivateGroups: 1, totalDirect: 1, totalLivechat: 0, totalDiscussions: 0, totalThreads: 0, teams: { totalTeams: 0, teamStats: [] }, totalLivechatVisitors: 0, totalLivechatAgents: 0, livechatEnabled: true, totalChannelMessages: 1, totalPrivateGroupMessages: 0, totalDirectMessages: 2, totalLivechatMessages: 0, totalMessages: 3, federatedServers: 2, federatedUsers: 1, lastLogin: 2021-08-28T09:17:43.245Z, lastMessageSentAt: 2021-08-28T11:08:52.990Z, lastSeenSubscription: 2021-08-28T11:08:53.034Z, os: { type: 'Linux', platform: 'linux', arch: 'x64', release: '5.10.0-8-amd64', uptime: 8119, loadavg: [Array], totalmem: 4084494336, freemem: 2580635648, cpus: [Array] }, process: { nodeVersion: 'v12.22.1', pid: 9, uptime: 8111.584058473 }, deploy: { method: 'docker', platform: 'selfinstall' }, enterpriseReady: true, uploadsTotal: 0, uploadsTotalSize: 0, migration: { _id: 'control', locked: false, version: 228, buildAt: '2021-08-26T16:38:18.840Z', lockedAt: 2021-08-28T08:57:07.153Z }, instanceCount: 1, oplogEnabled: true, mongoVersion: '4.0.26', mongoStorageEngine: 'mmapv1', uniqueUsersOfYesterday: { year: 2021, month: 8, day: 27, data: [] }, uniqueUsersOfLastWeek: { year: 2021, month: 8, day: 27, data: [] }, uniqueUsersOfLastMonth: { year: 2021, month: 8, day: 27, data: [] }, uniqueDevicesOfYesterday: { year: 2021, month: 8, day: 27, data: [] }, uniqueDevicesOfLastWeek: { year: 2021, month: 8, day: 27, data: [] }, uniqueDevicesOfLastMonth: { year: 2021, month: 8, day: 27, data: [] }, uniqueOSOfYesterday: { year: 2021, month: 8, day: 27, data: [] }, uniqueOSOfLastWeek: { year: 2021, month: 8, day: 27, data: [] }, uniqueOSOfLastMonth: { year: 2021, month: 8, day: 27, data: [] }, apps: { engineVersion: '1.27.1', enabled: true, totalInstalled: 0, totalActive: 0, totalFailed: 0 }, services: { ldap: [Object], saml: [Object], cas: [Object], oauth: [Object] }, integrations: { totalIntegrations: 0, totalIncoming: 0, totalIncomingActive: 0, totalOutgoing: 0, totalOutgoingActive: 0, totalWithScriptEnabled: 0 }, pushQueue: 0, enterprise: { modules: [], tags: [] }, createdAt: 2021-08-28T11:12:00.501Z, _updatedAt: 2021-08-28T11:12:00.501Z, success: true } } I20210828-11:57:07.741(0) API ➔ debug POST: /api/v1/method.call/license%3AgetTags I20210828-11:57:07.742(0) API ➔ debug Success { statusCode: 200, body: { message: '{"msg":"result","id":"258","result":[]}', success: true } } I20210828-11:57:09.052(0) API ➔ debug GET: /api/v1/stdout.queue I20210828-11:57:09.057(0) API ➔ debug Success { statusCode: 200, body: { queue: [ [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], ... 900 more items ], success: true } }

Also getting this in the log every minute (similar on both servers):

I20210828-12:07:31.457(0) Meteor ➔ publish null -> userId: null, arguments: [] I20210828-12:08:00.030(0) SyncedCron ➔ info Starting "Federation". I20210828-12:08:00.034(0) SyncedCron ➔ info Finished "Federation". I20210828-12:08:00.041(0) Federation ➔ client.debug dispatchEvents => domains=pergler-it.com events={ "type": "ping" } I20210828-12:08:00.043(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com I20210828-12:08:00.044(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com srv=_rocketchat._https.pergler-it.com I20210828-12:08:00.046(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com srv=_rocketchat._http.pergler-it.com I20210828-12:08:00.050(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com srv=_rocketchat._tcp.pergler-it.com I20210828-12:08:00.051(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com txt=rocketchat-tcp-protocol.pergler-it.com I20210828-12:08:00.053(0) Federation ➔ dns.debug search: peerDomain=pergler-it.com txt=rocketchat-public-key.pergler-it.com I20210828-12:08:00.054(0) Federation ➔ dns.debug search: found peerDomain=pergler-it.com srvEntry=rocketchat.pergler-it.com:443 protocol=https I20210828-12:08:00.055(0) Federation ➔ http.debug federationRequestToPeer => url=https://rocketchat.pergler-it.com:443/api/v1/federation.events.dispatch I20210828-12:08:00.056(0) Federation ➔ http.debug [POST] https://rocketchat.pergler-it.com:443/api/v1/federation.events.dispatch I20210828-12:08:00.063(0) API ➔ debug POST: /api/v1/federation.events.dispatch I20210828-12:08:00.064(0) Federation ➔ server.debug federation.events.dispatch => events={ "type": "ping" } I20210828-12:08:00.067(0) API ➔ debug Success { statusCode: 200, body: { success: true } } I20210828-12:08:00.397(0) SyncedCron ➔ info Starting "Generate download files for user data". I20210828-12:08:00.398(0) SyncedCron ➔ info Finished "Generate download files for user data". I20210828-12:08:35.494(0) Meteor ➔ publish null -> userId: null, arguments: []

debdutdeb commented 3 years ago

Hi, thanks for reporting this.

I currently don't have any federated servers handy - so it'll take me some time to look over this issue. Once I do, I'll let you know. I've asked one of my peers in the meantime if he knows someone who has federated instances or not.

cpergler commented 3 years ago

OK, thank you. Meanwhile I tried a second installation variant which is Univention Corporate Server (UCS) with Docker Image 3.17.0. Thought this could rule out my seperate NGINX reverse proxy as an error source, but here it is even worse: I dont even get to test the federation setup sucsessfully. When I click the "Test setup" button I get this in the logs:

image

cpergler commented 3 years ago

!!! EDIT 2021-09-06: THIS POSTING IS NOW CONTAINING OUTDATED INFO - PLEASE ALSO READ THE FOLLOWING POST !!!

OK, got it working. Changed my setup thou which is now as follows: Using pfsense with HAProxy plugin as reverse proxy instead of NGINX now. The server on one domain was now setup in a Proxmox LXC with Ubuntu 20.04 template, using the Snap installation method. The other domain is using a Proxmox VM with Ubuntu Server 20.04 (which is the easiest installation method of all, as I want to point out, because all one needs to do is selecting the rocketchat packet during the installation of Ubuntu Server, and its working right out of the box then). I can post my exact configuration here if someone is interested (I think the pfsense config is perhaps a little challenging for someone doing it the first time).

In general I think putting your rocketchat Server behind pfsense with HAProxy plugin is a good way to go if you A) need federation but B) don't have a free public IP available for exklusive use of port 443 with rocketchat. Also the HAProxy Plugin makes it very easy to get and maintain Letsencrypt certs for your various webservices. And for the time being I can NOT recommend using rocketchat on Univention Corporate Server when federation is needed.

cpergler commented 3 years ago

OK, even more discoveries: The final reason for all of my problems has been found! In the working setup, which I was mentioning in the previous posting, I was using different domains. After switching back to the production domains initially used (and replicating the obviously working setup here) I was stunned to see that again no messages were delivered between the federated servers. I then discovered that in the DNS TXT entry for the public key I had a typo in the TXT entrys name for one of the domains. So probably my initial setup would also have worked without that typo.

However I do not regret having been forced to try pfsense with HAProxy, because this is really some neat software.

Lessons that can be learned from that:

1) I dont know what kind of tests are exactly executed when you hit the "Test setup" button when you setup federation in the admin panel. But obviously it doesn't even test the existance of the (correct spelled!) "rocketchat-public-key..." TXT entry for the configured domain, not to speak of the value its containing... That should be fixed ASAP, because that would really be an easy test for a possible misconfiguration.

2) If somebody reading this has made the same mistake, fixed it, and is wondering why its still not working: Just wait for approx 1h until the server of the other federated domain has made another attempt to find the public key for your domain.