cloudwu / sproto

Yet another protocol library like google protocol buffers , but simple and fast.
MIT License
942 stars 253 forks source link

从1.2.0升级到1.7.0版本,节点无法收到其他节点的心跳信息,并在一段时间后报错 #115

Closed def-saizi-baka closed 8 months ago

def-saizi-baka commented 8 months ago

云大您好,我们项目近期在升级skynet版本(从 v1.2.0 -> v1.7.0)的时候出现了问题, 可以帮忙看一下吗

我们项目是多节点服务器, 有多个游戏逻辑节点和一个 router节点, 每个游戏服节点每隔一段时间(3s)就会向router发送心跳, 包含自己的ip地址信息, router收到会缓存这些信息

向router发送心跳

游戏节点发送心跳逻辑

function M.send_heartbreak()
    skynet.timeout(100, function() M.send_heartbreak() end)

    local ok, err = pcall(function()
        print("cluster_router_name: " .. cluster_router_name);
        print("self_node_name: " .. self_node_name);
        print("self_node_addr: " .. self_node_addr);
        lua_handler.ls_send(cluster_router_name, ".cluster_router", 
            skynet.pack("ls_node_heartbreak", self_node_name, self_node_addr, refresh_all_count > 0))
    end)
    if not ok then
        g_log:error("cluster_router may crashed!: " .. err);
    else
        if refresh_all_count > 0 then
            refresh_all_count = refresh_all_count - 1
        end
    end
end

--跨节点发送
function lua_handler.ls_send(node_name, address, buffer, sz)
    local args = skynet.tostring(buffer, sz)
    skynet.trash(buffer, sz)
    local node = get_node(node_name)
    if not node then
        g_log:error("ls_send unknown node", node_name, address, args)
        error("unknown node:" .. node_name)
    end
    -- 这里send也是正常没有报错
    node:send(address, args)
end

function NodeCls:send(addr, args)
    self:_send("send", nil, addr, args)
end

function NodeCls:_send(cmd, session_id, addr, args)
    SEND_COUNT = SEND_COUNT + 1
    if not self.sock_id then
        self:__connect()
    end

    local data = skynet.packstring(cmd, session_id, addr, args)
    if string.len(data) < MAX_PACK_SIZE then
        gate_utils.send_sock_data(self.sock_id, string.pack(">s3", data))
        return
    end
    -- 分包
    local index=1
    local padding_size = MAX_PACK_SIZE - 1024
    local p_data = nil
    for i=1, string.len(data), padding_size do
        p_data = skynet.packstring("padding", session_id, index, string.sub(data, i, i+padding_size-1))
        gate_utils.send_sock_data(self.sock_id, string.pack(">s3", p_data))
        index = index + 1
    end
    p_data = skynet.packstring("padding", session_id, index, "")
    gate_utils.send_sock_data(self.sock_id, string.pack(">s3", p_data))
end

router接受心跳逻辑

function lua_handles.ls_node_heartbreak(node_name, addr, refresh_all)
    print("ls_node_heartbreak"); -- 正常应该收到并输出这段
    require("addr_mgr").on_node_heartbreak(node_name, addr, refresh_all)
end

function addr_mgr.on_node_heartbreak(node_name, addr, refresh_all)
    print("node_name: " .. node_name);
    addr_mgr._down_node_dict[node_name] = nil
    ....
end

router这边的日志中也能看到socket连接到信息

[:0000000e] socket open: 2
[:0000000e] socket open: 3
[:0000000e] socket open: 4
[:0000000e] socket open: 5
[:0000000e] socket open: 6

游戏逻辑服在发送过程中也没有出现报错

游戏逻辑节点输出(106.75.67.xx 就是本服务器的ip)

[2024-02-19 17:04:25][clusterd-c]   cluster_router_name: s2800_cluster_router
[2024-02-19 17:04:25][clusterd-c]   self_node_name: s2_game
[2024-02-19 17:04:25][clusterd-c]   self_node_addr: 106.75.67.xx:10216

但是router收不到心跳请求, 并在静止一段时间后报错 Invalid serialize

[:0000000c] lua call [e to :c : 2 msgsz = 15906] error : ./skynet/lualib/skynet.lua:968: ./skynet/lualib/skynet.lua:931: Invalid serialize stream 9 (line:437)
stack traceback:
    [C]: in function 'assert'
    ./skynet/lualib/skynet.lua:968: in function 'skynet.dispatch_message'
[:0000000c] lua call [e to :c : 3 msgsz = 15906] error : ./skynet/lualib/skynet.lua:968: ./skynet/lualib/skynet.lua:931: Invalid serialize stream 9 (line:437)
stack traceback:
    [C]: in function 'assert'
    ./skynet/lualib/skynet.lua:968: in function 'skynet.dispatch_message'
[:0000000c] lua call [e to :c : 6 msgsz = 16418] error : ./skynet/lualib/skynet.lua:968: ./skynet/lualib/skynet.lua:931: Invalid serialize stream 16351 (line:407)
stack traceback:
    [C]: in function 'assert'
    ./skynet/lualib/skynet.lua:968: in function 'skynet.dispatch_message'
[:0000000c] lua call [e to :c : 4 msgsz = 16930] error : ./skynet/lualib/skynet.lua:968: ./skynet/lualib/skynet.lua:931: Invalid serialize stream 16828 (line:510)
stack traceback:
    [C]: in function 'assert'
    ./skynet/lualib/skynet.lua:968: in function 'skynet.dispatch_message'
[:0000000c] lua call [e to :c : 5 msgsz = 17954] error : ./skynet/lualib/skynet.lua:968: ./skynet/lualib/skynet.lua:931: Invalid serialize stream 11 (line:437)
stack traceback:
    [C]: in function 'assert'
    ./skynet/lualib/skynet.lua:968: in function 'skynet.dispatch_message'

请问可能是什么原因所导致的

def-saizi-baka commented 8 months ago

抱歉, 提交错仓库,我再重新提交下