colyseus / proxy

🔀⚔ Proxy and Service Discovery for Colyseus 0.10 ~ 0.14 (Not recommended on 0.15+)
https://docs.colyseus.io/scalability/
MIT License
37 stars 25 forks source link

Seat Reservation Expired after a few connections when using Colyseus Proxy + Redis #14

Closed bquangDinh closed 2 years ago

bquangDinh commented 2 years ago

I've been following the guideline of scalability on the Colyseus website. It works great when there were a few connections to my game room. Then I saw something in pm2 logs saying Seat Reservation Expired and every connection trying to connect to a room after that error was failed. I was kinda sure this problem came from Redis because it was working so great, if there was something going wrong in my implementation, it would be crashed on the first try. So I captured the Redis log when it was normal and when it was failed. I placed them here

This one was normal behavior of Redis Screenshot from 2021-09-24 00-39-19

This one was failed during the attempt to connect to a room. I saw Redis unsubscribed the room immediately after it was created, and this led to the error. Screenshot from 2021-09-24 00-34-57

This is the Seat Reservation Expired error Screenshot from 2021-09-24 00-54-47

This is my ecosystem.config.js for pm2. I run 2 instances of colyseus proxy when deploy. It will go wrong the same way even I only use one instance of colyseus proxy. Screenshot from 2021-09-24 00-57-47

This is my server setup

import { MongooseDriver } from "@colyseus/mongoose-driver"
import { constructSkinsData, loadSkinBase64sbyId } from "./datas/skins";
import GameRoom from "./rooms/game-room";
import Routes from "./routes/routes";

const http = require('http');
const express = require('express');
const cors = require('cors');
const rateLimit = require('express-rate-limit');

const apiLimiter = rateLimit({
  windowMs: 5 * 60 * 1000, // 5 minutes
  max: 10, // 10 requeqst during 5 minutes
})

const app = express();

const PORT = Number(process.env.PORT) + Number(process.env.NODE_APP_INSTANCE);

/*Create an app*/
app.set('trust proxy', 1);

app.use( express.json() );       // to support JSON-encoded bodies

app.use( express.urlencoded({     // to support URL-encoded bodies
  extended: true
})); 

app.use(cors());

app.use(express.static(__dirname + '../../../dist/client'));

app.get('/', function (req : any, res : any) {
  res.sendFile(__dirname + '../../../dist/client/index.html');
});

app.use('/api', new Routes().router);

app.use('/matchmake/', apiLimiter);
/*-------------*/

const colyseus = require("colyseus");
const { WebSocketTransport } = require('@colyseus/ws-transport');

const server = http.createServer(app);

const gameServer = new colyseus.Server({
  transport: new WebSocketTransport({
    server
  }),
  presence: new colyseus.RedisPresence({
    url: `redis://127.0.0.1:6379/0`
  }),
  driver: new MongooseDriver(),
});

gameServer.define('game-room', GameRoom);

/*Initialize essential assets for this server*/
console.log('Initializing essential assets...');
constructSkinsData().then(() => {
  console.log('DONE LOADING ASSETS!');

  gameServer.listen(PORT);

  console.log('Starting listening on ', PORT);
});
GbGr commented 2 years ago

Well yes, I had this problem too. Most likely you have the same reason of this problem: proxy fails with socket hang up error and after that request go to the wrong server wich don't have this seat reservation. In general, I have not found a solution and changed arhitecture of my game server to metaserver like (metaserver distribute clients to the right servers)

bquangDinh commented 2 years ago

Well yes, I had this problem too. Most likely you have the same reason of this problem: proxy fails with socket hang up error and after that request go to the wrong server wich don't have this seat reservation. In general, I have not found a solution and changed arhitecture of my game server to metaserver like (metaserver distribute clients to the right servers)

Starting finding solution from the socket hang up is a good start. I haven't thought it would happen from the socket hang up error before. Let's me try to see what's wrong. So thank you.

About the metaserver, it's something like nginx?

bquangDinh commented 2 years ago

I don't know if anyone facing the same issue. But this is my quick solution.

I got this error because the user decided to quit the browser (or refresh) before the content from the server arrive. The Colyseus Proxy will produce socket hang up error. After the socket hang up error, everyone who attempts to join room after the error will get the Seat Reservation Expired error because of the proxy malfunction.

So I took a look at the Colyseus Proxy source code at node_modules/@colyseus/proxy/proxy.js. The thing is, instead of just simply ignore the request from the user who has aborted the request. Colyseus Proxy will unregister the processId and then try to register a new one again, but then it fails to do it either, so after all, there will be no processId available for incoming users.

The way I fixed it (maybe just a temporary fix) is simply comment these 3 lines in proxy.on('error'): //unregister(node); //discovery_1.cleanUpNode(node).then(function () { return console.log("cleaned up " + node.processId + " presence"); }); //reqHandler(req, res); // try again! Therefore preventing Colyseus Proxy from unregistering the node.

And add this one instead: res.end()

I haven't known yet the consequences of doing this way, but for me. It worked!

For more detail about socket hang up error. Please take a look here https://stackoverflow.com/a/27835115

Thanks @GbGr for suggesting the hint about socket hang up error.

I close the issue from now on.

endel commented 2 years ago

Thanks for documenting this out @bquangDinh

endel commented 2 years ago

Re-opening as this is should be properly checked for in our error handler.

GbGr commented 2 years ago

I don't know if anyone facing the same issue. But this is my quick solution.

I got this error because the user decided to quit the browser (or refresh) before the content from the server arrive. The Colyseus Proxy will produce socket hang up error. After the socket hang up error, everyone who attempts to join room after the error will get the Seat Reservation Expired error because of the proxy malfunction.

So I took a look at the Colyseus Proxy source code at node_modules/@colyseus/proxy/proxy.js. The thing is, instead of just simply ignore the request from the user who has aborted the request. Colyseus Proxy will unregister the processId and then try to register a new one again, but then it fails to do it either, so after all, there will be no processId available for incoming users.

The way I fixed it (maybe just a temporary fix) is simply comment these 3 lines in proxy.on('error'): //unregister(node); //discovery_1.cleanUpNode(node).then(function () { return console.log("cleaned up " + node.processId + " presence"); }); //reqHandler(req, res); // try again! Therefore preventing Colyseus Proxy from unregistering the node.

And add this one instead: res.end()

I haven't known yet the consequences of doing this way, but for me. It worked!

For more detail about socket hang up error. Please take a look here https://stackoverflow.com/a/27835115

Thanks @GbGr for suggesting the hint about socket hang up error.

I close the issue from now on.

Hey @bquangDinh! Did this fix working? :) Or are there more hidden traps?

bquangDinh commented 2 years ago

I don't know if anyone facing the same issue. But this is my quick solution. I got this error because the user decided to quit the browser (or refresh) before the content from the server arrive. The Colyseus Proxy will produce socket hang up error. After the socket hang up error, everyone who attempts to join room after the error will get the Seat Reservation Expired error because of the proxy malfunction. So I took a look at the Colyseus Proxy source code at node_modules/@colyseus/proxy/proxy.js. The thing is, instead of just simply ignore the request from the user who has aborted the request. Colyseus Proxy will unregister the processId and then try to register a new one again, but then it fails to do it either, so after all, there will be no processId available for incoming users. The way I fixed it (maybe just a temporary fix) is simply comment these 3 lines in proxy.on('error'): //unregister(node); //discovery_1.cleanUpNode(node).then(function () { return console.log("cleaned up " + node.processId + " presence"); }); //reqHandler(req, res); // try again! Therefore preventing Colyseus Proxy from unregistering the node. And add this one instead: res.end() I haven't known yet the consequences of doing this way, but for me. It worked! For more detail about socket hang up error. Please take a look here https://stackoverflow.com/a/27835115 Thanks @GbGr for suggesting the hint about socket hang up error. I close the issue from now on.

Hey @bquangDinh! Did this fix working? :) Or are there more hidden traps?

It's still working fine for me. I hope @endel gets this fixed.

jldinh commented 2 years ago

I ran into the same issue, and used a similar fix: https://github.com/workandplay/proxy/commit/2aa48427184f0782f77cb28d594564143e2bb2d6#diff-84ebb7b78a17bdd955d464accb5232ffd9eadafd97d6ca56e90c5281b7201775

I also added logic in the colyseus server to re-register periodically, as long as the server is running, to make sure servers are not "orphaned".

In my experience, the socket hang up error occurs when too many clients try to connect simultaneously (~50 clients started at the same time, especially when the state to sync is big, e.g., a few MBs). The colyseus server seems to start dropping connections then.

I had a look at how the colyseus server handles concurrency to find the root cause, and was a bit confused by the use of a timer to spread out connections: https://github.com/colyseus/colyseus/blob/85bacb8068adfc215eb53c12b1f0f681ae0bbd44/packages/core/src/MatchMaker.ts#L421-L449 I don't know if anyone can provide insight about why it is done that way instead of using a queue, but perhaps this should be discussed in a separate issue on the main repo?

endel commented 2 years ago

Interesting, thanks for sharing @jldinh, I think this is worth discussing in the main repo if you don't mind creating an issue there

@lpsandaruwan just fixed this on #15 🙌

jldinh commented 2 years ago

@endel Sure, here it is: https://github.com/colyseus/colyseus/issues/466

joshfeinsilber commented 2 years ago

For anyone also experiencing this issue, @bquangDinh's fix has also worked well for me.

This issue does seem to appear when a client disconnects while making the request to join the room, but I haven't tested enough to know for sure.

Regardless, commenting out those three lines and allowing the proxy to continue running despite those errors that come through seems to be doing the trick.

jldinh commented 2 years ago

@endel I believe there's a bug in the fix in #15 so I submitted another PR #22 . Would you mind having a look?