bkacjios / Lumble

A mostly pure Lua based mumble implementation for creating a bot
11 stars 2 forks source link

Bot unable to interact with server after reconnecting to a restarted server. #10

Closed CaptainZidgel closed 3 years ago

CaptainZidgel commented 3 years ago

I've noticed a long standing issue where my bot stops responding to things - the client still logs things like connections and disconnects, but doesn't respond to messages, at least. Here is a minimum reproducible example (Latest version of Lumble, but this issue existed before the recent change dump)

ping/pong code

local mumble = require("lumble")
local utf8 = require("lua-utf8")
local log = require("log")

local params = {
    mode = "client",
    protocol = "sslv23",
    key = "config/k.key",
    certificate = "config/k.pem"
}

hostname = --
port = 27345

local client = mumble.getClient(hostname, port, params)
if not client then return end
client:auth("ZIDGE")

client:hook("OnTextMessage", function (client, event)
    log.info("Received message %s", event.message)
    if event.message == "ping" then
        event.actor:getChannel():message("ping")
        log.info("Sending message back")
    end
end)

User sending message in Mumble server

[10:56:37 AM] Connecting to server --.
[10:56:37 AM] Connected.
[10:56:37 AM] Welcome message: 
[10:56:41 AM] To Root: ping
[10:56:41 AM] (Channel) ZIDGE: ping
[10:56:49 AM] Server connection failed: The remote host closed the connection.
[10:56:59 AM] Reconnecting.
[10:56:59 AM] Connected.
[10:56:59 AM] Welcome message: 
[10:57:02 AM] To Root: ping
//no response (bot is in the same channel at time of message)

Lumble logs

//I start the bot once on a server
[INFO  - 10:56:31] version: 1.3.4-1~ppa2~focal1
[INFO  - 10:56:31] system : Ubuntu 20.04 LTS
[INFO  - 10:56:31] CryptState: VALID
[WARN  - 10:56:31] Server maximum network bandwidth is only %d kbit/s.
[WARN  - 10:56:31] Audio quality auto-adjusted to 72 kbit/s (33 ms)
[INFO  - 10:56:31] message: 
[INFO  - 10:56:37] SuperUser2 connected to channel Root
[INFO  - 10:56:41] Received message ping //everything works
[INFO  - 10:56:41] Sending message back
//the server closes - in this case it is done manually but in practice it happens when external server providers restart for any reason
[ERROR - 10:56:49] TCP connection error "closed"
[DEBUG - 10:56:49] reconnecting.. (1st attempt)
[DEBUG - 10:56:50] reconnecting.. (2nd attempt)
//server restarted
[INFO  - 10:56:50] version: 1.3.4-1~ppa2~focal1
[INFO  - 10:56:50] system : Ubuntu 20.04 LTS
[INFO  - 10:56:50] CryptState: VALID
[WARN  - 10:56:50] Server maximum network bandwidth is only %d kbit/s.
[WARN  - 10:56:50] Audio quality auto-adjusted to 72 kbit/s (33 ms)
[INFO  - 10:56:50] message: 
[INFO  - 10:56:59] SuperUser2 connected to channel Root
//no acknowledgement of message

I wish I had more debug information but I don't have the idea to know what to search for unfortunately. Let me know if you need anything else from me.

bkacjios commented 3 years ago

Strange. I'm guessing the bot starts working again after manually restarting it then? Also, does the bot actually stay connected for more than 30 seconds? If it's staying connected, that means it's successfully sending a ping to the server to keep the connection alive, so it's definitely able to communicate with the server.

https://github.com/bkacjios/Lumble/blob/321808df107d0d356fe9a1e178e0badf1919497b/modules/lumble/client/init.lua#L356

If the server doesn't receive this message from the bot every 30 seconds, the server will eventually kick it for being "disconnected"

This sort of sounds like the rate limiting problem I was having at one point, where if the bot was connected for almost a month it would no longer receive certain events..

bkacjios commented 3 years ago

From my tests via kicking the bot, it seems to reconnect and receive messages just fine..

[INFO  - 14:31:13] Bkacjios moved to BOOM TETRIS
[INFO  - 14:31:13] LuaBot moved to BOOM TETRIS
[DEBUG - 14:31:17] Bkacjios played: #shine
[DEBUG - 14:31:20] Bkacjios played: #boom
[WARN  - 14:31:26] LuaBot kicked by Bkacjios (Reason "test")
[ERROR - 14:31:26] TCP connection error "closed"
[DEBUG - 14:31:27] reconnecting.. (1st attempt)
[INFO  - 14:31:27] version: 1.3.4-1~ppa2~bionic1
[INFO  - 14:31:27] system : Ubuntu 18.04.5 LTS
[INFO  - 14:31:27] CryptState: VALID
[WARN  - 14:31:27] Server maximum network bandwidth is only 96000 kbit/s.
[WARN  - 14:31:27] Audio quality auto-adjusted to 96 kbit/s (57 ms)
[INFO  - 14:31:27] message:
[INFO  - 14:31:27] LuaBot connected from ::1
[INFO  - 14:31:27] Bkacjios connected
[DEBUG - 14:31:38] Bkacjios played: #boom
[DEBUG - 14:31:40] Bkacjios played: #shine
bkacjios commented 3 years ago

Okay, yeah, even when I restarted the server, the bot took 2 attempts to reconnect and everything is still working as normal.

The only thing I can suggest you try is changing the messageburst limits in the mumble-server.ini

messageburst=100
messagelimit=50

I set these back when murmur had a weird bug where it would prevent people from sending messages if they spammed things too quickly, except it would happen if the client was connected for a long period of time. Maybe that's what's going on here for some reason?

CaptainZidgel commented 3 years ago

I'm guessing the bot starts working again after manually restarting it then? Also, does the bot actually stay connected for more than 30 seconds?

Yes and yes

I tried changing the messageburst/messagelimit variables to the values you provided (from not being set at all on my test server, not sure what values are being used by my non-technical friend's rented server) but it didn't cause a change. You are right that the problem is not really the client not communicating with the server. I was going to suggest it was perhaps hooks not being called, but log traces show client:hookCall is absolutely called. So I modified https://github.com/bkacjios/Lumble/blob/master/modules/lumble/client/init.lua#L286 to print if there was no hook to call and it isn't. Which is weird because why would my hooks be removed on a restarted server but not yours? if not self.hooks[name] then log.info("No hook for %s", name) return end Here's a simple test that has hooks for OnUserConnected and OnTextMessage

[TRACE - 12:35:52] Send TCP packet [0][Version] to server                         //first connect
[TRACE - 12:35:52] Send TCP packet [2][Authenticate] to server
[TRACE - 12:35:52] Received packet [0][Version]
[INFO  - 12:35:52] version: 1.3.4-1~ppa2~focal1
[INFO  - 12:35:52] system : Ubuntu 20.04 LTS
[TRACE - 12:35:53] Received packet [15][CryptSetup]
[INFO  - 12:35:53] CryptState: VALID
[TRACE - 12:35:53] Send encrypted UDP Buffer[3/64] to server
[TRACE - 12:35:53] Call hook OnCryptSetup()
[INFO  - 12:35:53] No hook for OnCryptSetup
[TRACE - 12:35:53] Received packet [21][CodecVersion]
[TRACE - 12:35:53] Call hook OnCodecVersion()
[INFO  - 12:35:53] No hook for OnCodecVersion
[TRACE - 12:35:53] Received packet [7][ChannelState]
[TRACE - 12:35:53] Call hook OnChannelState(table: 0x400a35e0)
[INFO  - 12:35:53] No hook for OnChannelState
[TRACE - 12:35:53] Received packet [7][ChannelState]
[TRACE - 12:35:53] Call hook OnChannelState(table: 0x409713d8)
[INFO  - 12:35:53] No hook for OnChannelState
[TRACE - 12:35:53] Received packet [7][ChannelState]
[TRACE - 12:35:53] Call hook OnChannelState(table: 0x409687a0)
[INFO  - 12:35:53] No hook for OnChannelState
[TRACE - 12:35:53] Received packet [20][PermissionQuery]
[TRACE - 12:35:53] Permissions for B updated: 134219775
[TRACE - 12:35:53] Call hook OnPermissionQuery(table: 0x40f03ee0)
[INFO  - 12:35:53] No hook for OnPermissionQuery
[TRACE - 12:35:53] Received packet [20][PermissionQuery]
[TRACE - 12:35:53] Permissions for Root updated: 135202815
[TRACE - 12:35:53] Call hook OnPermissionQuery(table: 0x40f03ee0)
[INFO  - 12:35:53] No hook for OnPermissionQuery
[TRACE - 12:35:53] Received packet [9][UserState]
[TRACE - 12:35:53] Send TCP packet [22][UserStats] to server
[TRACE - 12:35:53] Call hook OnUserState(table: 0x416cb628)
[INFO  - 12:35:53] No hook for OnUserState
[TRACE - 12:35:53] Received packet [9][UserState]
[TRACE - 12:35:53] Send TCP packet [22][UserStats] to server
[TRACE - 12:35:53] Call hook OnUserState(table: 0x408d5c30)
[INFO  - 12:35:53] No hook for OnUserState
[TRACE - 12:35:53] Received packet [5][ServerSync]
[WARN  - 12:35:53] Server maximum network bandwidth is only %d kbit/s.
[WARN  - 12:35:53] Audio quality auto-adjusted to 72 kbit/s (33 ms)
[INFO  - 12:35:53] message: 
[TRACE - 12:35:53] Send TCP packet [22][UserStats] to server
[TRACE - 12:35:53] Send TCP packet [22][UserStats] to server
[TRACE - 12:35:53] Call hook OnServerSync(ZIDGE)
[INFO  - 12:35:53] No hook for OnServerSync
[TRACE - 12:35:53] Received packet [24][ServerConfig]
[TRACE - 12:35:53] Call hook OnServerConfig(table: 0x40f03f08)
[INFO  - 12:35:53] No hook for OnServerConfig
[TRACE - 12:35:53] Ping UDP: 92.00 ms
[TRACE - 12:35:53] Received packet [22][UserStats]
[INFO  - 12:35:53] ZIDGE connected from 47.152.131.178
[TRACE - 12:35:53] Call hook OnUserStats(table: 0x41150590)
[INFO  - 12:35:53] No hook for OnUserStats
[TRACE - 12:35:53] Received packet [22][UserStats]
[INFO  - 12:35:53] testuser connected from 47.152.131.178
[TRACE - 12:35:53] Call hook OnUserStats(table: 0x408de2d0)
[INFO  - 12:35:53] No hook for OnUserStats
[TRACE - 12:35:53] Received packet [22][UserStats]
[TRACE - 12:35:53] Call hook OnUserStats(table: 0x411489f8)
[INFO  - 12:35:53] No hook for OnUserStats
[TRACE - 12:35:53] Received packet [22][UserStats]
[TRACE - 12:35:53] Call hook OnUserStats(table: 0x4113e770)
[INFO  - 12:35:53] No hook for OnUserStats
[TRACE - 12:35:59] Received packet [8][UserRemove]
[INFO  - 12:35:59] testuser disconnected                                   //disconnecting test user because I was already connected
[TRACE - 12:35:59] Call hook OnUserRemove(table: 0x40efa610)
[INFO  - 12:35:59] No hook for OnUserRemove
[TRACE - 12:35:59] Received packet [9][UserState]
[TRACE - 12:35:59] Send TCP packet [22][UserStats] to server
[INFO  - 12:35:59] testuser connected to channel B                                 //reconnect
[TRACE - 12:35:59] Call hook OnUserConnected(table: 0x40090db0)      //call hook
User connected                                                                                      //hook prints message
[TRACE - 12:35:59] Call hook OnUserState(table: 0x40090db0)
[INFO  - 12:35:59] No hook for OnUserState
[TRACE - 12:35:59] Received packet [9][UserState]
[TRACE - 12:35:59] Call hook OnUserState(table: 0x416e14e0)
[INFO  - 12:35:59] No hook for OnUserState
[TRACE - 12:35:59] Received packet [22][UserStats]
[INFO  - 12:35:59] testuser connected from 47.152.131.178
[TRACE - 12:35:59] Call hook OnUserStats(table: 0x416c8dd8)
[INFO  - 12:35:59] No hook for OnUserStats
[ERROR - 12:36:07] TCP connection error "closed"                                    //restart server
[TRACE - 12:36:07] Call hook OnDisconnect()
[DEBUG - 12:36:07] reconnecting.. (1st attempt)
[TRACE - 12:36:08] Send TCP packet [0][Version] to server
[TRACE - 12:36:08] Send TCP packet [2][Authenticate] to server
[TRACE - 12:36:08] Received packet [0][Version]
[INFO  - 12:36:08] version: 1.3.4-1~ppa2~focal1
[INFO  - 12:36:08] system : Ubuntu 20.04 LTS
[TRACE - 12:36:08] Received packet [15][CryptSetup]
[INFO  - 12:36:08] CryptState: VALID
[TRACE - 12:36:08] Send encrypted UDP Buffer[3/64] to server
[TRACE - 12:36:08] Call hook OnCryptSetup()
[INFO  - 12:36:08] No hook for OnCryptSetup
[TRACE - 12:36:08] Received packet [21][CodecVersion]
[TRACE - 12:36:08] Call hook OnCodecVersion()
[INFO  - 12:36:08] No hook for OnCodecVersion
[TRACE - 12:36:08] Received packet [7][ChannelState]
[TRACE - 12:36:08] Call hook OnChannelState(table: 0x416d56e8)
[INFO  - 12:36:08] No hook for OnChannelState
[TRACE - 12:36:08] Received packet [7][ChannelState]
[TRACE - 12:36:08] Call hook OnChannelState(table: 0x40fee220)
[INFO  - 12:36:08] No hook for OnChannelState
[TRACE - 12:36:08] Received packet [7][ChannelState]
[TRACE - 12:36:08] Call hook OnChannelState(table: 0x40efaac0)
[INFO  - 12:36:08] No hook for OnChannelState
[TRACE - 12:36:08] Received packet [20][PermissionQuery]
[TRACE - 12:36:08] Permissions for B updated: 134219775
[TRACE - 12:36:08] Call hook OnPermissionQuery(table: 0x41001bf8)
[INFO  - 12:36:08] No hook for OnPermissionQuery
[TRACE - 12:36:08] Received packet [20][PermissionQuery]
[TRACE - 12:36:08] Permissions for Root updated: 135202815
[TRACE - 12:36:08] Call hook OnPermissionQuery(table: 0x41001bf8)
[INFO  - 12:36:08] No hook for OnPermissionQuery
[TRACE - 12:36:08] Received packet [9][UserState]
[TRACE - 12:36:08] Send TCP packet [22][UserStats] to server
[TRACE - 12:36:08] Call hook OnUserState(table: 0x40ef28e0)
[INFO  - 12:36:08] No hook for OnUserState
[TRACE - 12:36:08] Received packet [5][ServerSync]
[WARN  - 12:36:08] Server maximum network bandwidth is only %d kbit/s.
[WARN  - 12:36:08] Audio quality auto-adjusted to 72 kbit/s (33 ms)
[INFO  - 12:36:08] message: 
[TRACE - 12:36:08] Send TCP packet [22][UserStats] to server
[TRACE - 12:36:08] Call hook OnServerSync(ZIDGE)
[INFO  - 12:36:08] No hook for OnServerSync
[TRACE - 12:36:08] Received packet [24][ServerConfig]
[TRACE - 12:36:08] Call hook OnServerConfig(table: 0x41001c20)
[INFO  - 12:36:08] No hook for OnServerConfig
[TRACE - 12:36:08] Ping UDP: 69.00 ms
[TRACE - 12:36:08] Received packet [22][UserStats]
[INFO  - 12:36:08] ZIDGE connected from 47.152.131.178
[TRACE - 12:36:08] Call hook OnUserStats(table: 0x411436f0)
[INFO  - 12:36:08] No hook for OnUserStats
[TRACE - 12:36:08] Received packet [22][UserStats]
[TRACE - 12:36:08] Call hook OnUserStats(table: 0x40eec6c8)
[INFO  - 12:36:08] No hook for OnUserStats
[TRACE - 12:36:14] Received packet [9][UserState]
[TRACE - 12:36:14] Send TCP packet [22][UserStats] to server
[INFO  - 12:36:14] testuser connected to channel B                                     //test user connects
[TRACE - 12:36:14] Call hook OnUserConnected(table: 0x408d81e0)          //hook called
[INFO  - 12:36:14] No hook for OnUserConnected                                     //no hook!
[TRACE - 12:36:14] Call hook OnUserState(table: 0x408d81e0)
[INFO  - 12:36:14] No hook for OnUserState
[TRACE - 12:36:14] Received packet [9][UserState]
[TRACE - 12:36:14] Call hook OnUserState(table: 0x40efd590)
[INFO  - 12:36:14] No hook for OnUserState
[TRACE - 12:36:14] Received packet [22][UserStats]
[INFO  - 12:36:14] testuser connected from 47.152.131.178
[TRACE - 12:36:14] Call hook OnUserStats(table: 0x40ff4648)
[INFO  - 12:36:14] No hook for OnUserStats
[TRACE - 12:36:37] Send TCP packet [3][Ping] to server
[TRACE - 12:36:37] Send encrypted UDP Buffer[4/64] to server
[TRACE - 12:36:37] Received packet [3][Ping]
[TRACE - 12:36:37] Ping TCP: 70.00 ms
[TRACE - 12:36:37] Call hook OnPing(70)
[INFO  - 12:36:37] No hook for OnPing
[TRACE - 12:36:37] Ping UDP: 73.00 ms
[TRACE - 12:37:07] Send TCP packet [3][Ping] to server
[TRACE - 12:37:07] Send encrypted UDP Buffer[4/64] to server
[TRACE - 12:37:07] Ping UDP: 69.00 ms
[TRACE - 12:37:07] Received packet [3][Ping]
[TRACE - 12:37:07] Ping TCP: 70.00 ms
[TRACE - 12:37:07] Call hook OnPing(70)
[INFO  - 12:37:07] No hook for OnPing
[TRACE - 12:37:37] Send TCP packet [3][Ping] to server
[TRACE - 12:37:37] Send encrypted UDP Buffer[4/64] to server
[TRACE - 12:37:37] Ping UDP: 67.00 ms
[TRACE - 12:37:37] Received packet [3][Ping]
[TRACE - 12:37:37] Ping TCP: 78.00 ms
[TRACE - 12:37:37] Call hook OnPing(78)
[INFO  - 12:37:37] No hook for OnPing
[TRACE - 12:37:50] Received packet [11][TextMessage]                               //receive message
[TRACE - 12:37:50] Call hook OnTextMessage(table: 0x401497d8)             //call hook
[INFO  - 12:37:50] No hook for OnTextMessage                                        //no hook!
[TRACE - 12:38:07] Send TCP packet [3][Ping] to server
[TRACE - 12:38:07] Send encrypted UDP Buffer[4/64] to server
[TRACE - 12:38:07] Received packet [3][Ping]
[TRACE - 12:38:07] Ping TCP: 69.00 ms
[TRACE - 12:38:07] Call hook OnPing(69)
[INFO  - 12:38:07] No hook for OnPing
[TRACE - 12:38:07] Ping UDP: 70.00 ms
bkacjios commented 3 years ago

Aha, so yeah, that's your problem.

In my autoreconnect code, it basically creates a brand new client object, so all previous hooks will be lost.

https://github.com/bkacjios/Lumble/blob/09e0b315d35246108f6b88ae06a9784564a303e4/modules/lumble/init.lua#L28

I reload my scripts by running this bit of code, which basically loads ./scripts/init.lua and recreates all my commands and hooks.

https://github.com/bkacjios/Lumble/blob/09e0b315d35246108f6b88ae06a9784564a303e4/modules/lumble/init.lua#L40

It would probably be a smarter idea to save the hooks and just reapply them to the new client instead.

Here's a little proof of concept of some possible changes to modules/lumble/init.lua that you could try out.

local client = require("lumble.client")
local server = require("lumble.server")
local reload = require("autoreload.reload")
--local copas = require("copas")
local log = require("log")

local mumble = {
    clients = {},
    servers = {},
    reconnect = {},
}

function mumble.connect(host, port, params, noretry)
    local client, err = client.new(host, port, params)

    if not client then
        if not noretry then
            table.insert(mumble.reconnect, {host = host, port = port, params = params, try = 1})
        end
        return false, err
    end

    mumble.clients[host] = mumble.clients[host] or {}
    mumble.clients[host][port] = client

    -- If we disconnect, try to reconnect
    client:hook("OnDisconnect", "Reconnect on Disconnect", function(client)
        mumble.clients[client.host][client.port] = nil
        table.insert(mumble.reconnect, {
            host = client.host,
            port = client.port,
            params = client.params,
            username = client.username,
            password = client.password,
            tokens = client.tokens,
            hooks = client.hooks, -- Carry over all hooks..
            commands = client.commands, -- Carry over all commands..
            try = 1,
        })
    end)

    reload.reload("scripts")

    return client
end

function mumble.getClients()
    return mumble.clients
end

function mumble.getClient(host, port, params)
    if mumble.clients[host] and mumble.clients[host][port] then
        return mumble.clients[host][port]
    end

    return mumble.connect(host, port, params)
end

function mumble.host(host, port)
    local params = {
        mode = "server",
        protocol = "any",
        key = "config/serverAkey.pem",
        certificate = "config/serverA.pem",
        cafile = "config/rootA.pem",
        verify = {"none"},
        options = "all",
    }
    return server.new(host, port, params)
end

function mumble.update()
    for i, info in pairs(mumble.reconnect) do
        log.debug("reconnecting.. (%s attempt)", math.stndrd(info.try))
        local client, err = mumble.connect(info.host, info.port, info.params, true)
        if client then
            mumble.reconnect[i] = nil
            client.hooks = info.hooks -- Reapply our hooks
            client.commands = info.commands -- Reapply our commands
            client:auth(info.username, info.password, info.tokens)
        else
            info.try = info.try + 1
        end
    end
end

--[[function mumble.setup()
    copas.addthread(mumble.update)
end]]

return mumble

Basically, I saved the disconnected clients hooks/commands table in the OnDisconnect hook, then reapplied them just before re-authing in mumble.update()

CaptainZidgel commented 3 years ago

This works, thank you.