ZLMediaKit / ZLToolKit

一个基于C++11的轻量级网络框架,基于线程池技术可以实现大并发网络IO
MIT License
1.94k stars 581 forks source link

Technical Consultation: modifyEvent prompts Invalid argument #226

Closed HalcyonHuang closed 2 months ago

HalcyonHuang commented 5 months ago

Hello, I am using the ZLToolKit/tests/test_tcpClient.cpp demo. When connecting to the tcp service, the modifyEvent function prompts "Invalid argument". Is this normal?

git commit: 04d1c47d2568f5ce1ff84260cefaf2754e514a5e

2024-04-18 10:52:38.787 D [test_tcpClient] [17486-stamp thread] util.cpp:366 operator() | Stamp thread started
2024-04-18 10:52:38.789 I [test_tcpClient] [17486-test_tcpClient] EventPoller.cpp:503 EventPollerPool | EventPoller created size: 2
2024-04-18 10:52:38.789 D [test_tcpClient] [17486-test_tcpClient] test_tcpClient.cpp:22 TestClient | 
2024-04-18 10:52:38.790 T [test_tcpClient] [17486-test_tcpClient] TcpClient.cpp:79 startConnect | TestClient-1 start connect 192.168.0.120:8082
2024-04-18 10:52:38.791 I [test_tcpClient] [17486-event poller 1] Socket.cpp:238 attachEvent | attachEvent rawFd 9 type0
2024-04-18 10:52:38.792 T [test_tcpClient] [17486-event poller 1] TcpClient.cpp:89 onSockConnect | TestClient-1 connect result: 0(success)
2024-04-18 10:52:38.792 I [test_tcpClient] [17486-event poller 1] test_tcpClient.cpp:30 onConnect | success
2024-04-18 10:52:38.792 E [test_tcpClient] [17486-event poller 1] Socket.cpp:262 operator() |  Event_Write rawFd 9 events 2
2024-04-18 10:52:38.792 E [test_tcpClient] [17486-event poller 1] EventPoller.cpp:184 modifyEvent |  stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument //错误提示
bool Socket::attachEvent(const SockNum::Ptr &sock) {
    weak_ptr<Socket> weak_self = shared_from_this();
    InfoL << "attachEvent rawFd " << sock->rawFd() << " type" << sock->type();
    if (sock->type() == SockNum::Sock_TCP_Server) {
        // tcp服务器
        auto result = _poller->addEvent(sock->rawFd(), EventPoller::Event_Read | EventPoller::Event_Error, [weak_self, sock](int event) {
            if (auto strong_self = weak_self.lock()) {
                strong_self->onAccept(sock, event);
            }
        });
        return -1 != result;
    }

    // tcp客户端或udp
    auto read_buffer = _poller->getSharedBuffer();
    auto result = _poller->addEvent(sock->rawFd(), EventPoller::Event_Read | EventPoller::Event_Error | EventPoller::Event_Write, [weak_self, sock, read_buffer](int event) {
        auto strong_self = weak_self.lock();
        if (!strong_self) {
            return;
        }

        if (event & EventPoller::Event_Read) {       
            ErrorL << " Event_Read rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->onRead(sock, read_buffer);
        }
        if (event & EventPoller::Event_Write) {
            ErrorL << " Event_Write rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->onWriteAble(sock);
        }
        if (event & EventPoller::Event_Error) { 
            ErrorL << " Event_Error rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->emitErr(getSockErr(sock->rawFd()));
        }
    });

    return -1 != result;
}
int EventPoller::modifyEvent(int fd, int event, PollCompleteCB cb) {
    TimeTicker();
    if (!cb) {
        cb = [](bool success) {};
    }
    if (isCurrentThread()) {
#if defined(HAS_EPOLL)
        struct epoll_event ev = { 0 };
        ev.events = toEpoll(event);
        ev.data.fd = fd;
        auto ret = epoll_ctl(_epoll_fd, EPOLL_CTL_MOD, fd, &ev);
        if(ret !=0){ //新增打印
            ErrorL << " stop write event error _epoll_fd " << _epoll_fd << " fd " << fd << " events " << event << " ret " << ret << " " << strerror(errno);
        }
        cb(ret == 0);
        return ret;
#else

#endif // HAS_EPOLL
    }

    return 0;
}

你好,我用的 ZLToolKit/tests/test_tcpClient.cpp的demo,连接tcp服务时,modifyEvent 函数 提示 Invalid argument,请问这种情况正常吗

git commit :04d1c47d2568f5ce1ff84260cefaf2754e514a5e

2024-04-18 10:52:38.787 D [test_tcpClient] [17486-stamp thread] util.cpp:366 operator() | Stamp thread started
2024-04-18 10:52:38.789 I [test_tcpClient] [17486-test_tcpClient] EventPoller.cpp:503 EventPollerPool | EventPoller created size: 2
2024-04-18 10:52:38.789 D [test_tcpClient] [17486-test_tcpClient] test_tcpClient.cpp:22 TestClient | 
2024-04-18 10:52:38.790 T [test_tcpClient] [17486-test_tcpClient] TcpClient.cpp:79 startConnect | TestClient-1 start connect 192.168.0.120:8082
2024-04-18 10:52:38.791 I [test_tcpClient] [17486-event poller 1] Socket.cpp:238 attachEvent | attachEvent rawFd 9 type0
2024-04-18 10:52:38.792 T [test_tcpClient] [17486-event poller 1] TcpClient.cpp:89 onSockConnect | TestClient-1 connect result: 0(success)
2024-04-18 10:52:38.792 I [test_tcpClient] [17486-event poller 1] test_tcpClient.cpp:30 onConnect | success
2024-04-18 10:52:38.792 E [test_tcpClient] [17486-event poller 1] Socket.cpp:262 operator() |  Event_Write rawFd 9 events 2
2024-04-18 10:52:38.792 E [test_tcpClient] [17486-event poller 1] EventPoller.cpp:184 modifyEvent |  stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument //错误提示
bool Socket::attachEvent(const SockNum::Ptr &sock) {
    weak_ptr<Socket> weak_self = shared_from_this();
    InfoL << "attachEvent rawFd " << sock->rawFd() << " type" << sock->type();
    if (sock->type() == SockNum::Sock_TCP_Server) {
        // tcp服务器
        auto result = _poller->addEvent(sock->rawFd(), EventPoller::Event_Read | EventPoller::Event_Error, [weak_self, sock](int event) {
            if (auto strong_self = weak_self.lock()) {
                strong_self->onAccept(sock, event);
            }
        });
        return -1 != result;
    }

    // tcp客户端或udp
    auto read_buffer = _poller->getSharedBuffer();
    auto result = _poller->addEvent(sock->rawFd(), EventPoller::Event_Read | EventPoller::Event_Error | EventPoller::Event_Write, [weak_self, sock, read_buffer](int event) {
        auto strong_self = weak_self.lock();
        if (!strong_self) {
            return;
        }

        if (event & EventPoller::Event_Read) {       
            ErrorL << " Event_Read rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->onRead(sock, read_buffer);
        }
        if (event & EventPoller::Event_Write) {
            ErrorL << " Event_Write rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->onWriteAble(sock);
        }
        if (event & EventPoller::Event_Error) { 
            ErrorL << " Event_Error rawFd "   << sock->rawFd() << " events " << event  ;//新增打印
            strong_self->emitErr(getSockErr(sock->rawFd()));
        }
    });

    return -1 != result;
}
int EventPoller::modifyEvent(int fd, int event, PollCompleteCB cb) {
    TimeTicker();
    if (!cb) {
        cb = [](bool success) {};
    }
    if (isCurrentThread()) {
#if defined(HAS_EPOLL)
        struct epoll_event ev = { 0 };
        ev.events = toEpoll(event);
        ev.data.fd = fd;
        auto ret = epoll_ctl(_epoll_fd, EPOLL_CTL_MOD, fd, &ev);
        if(ret !=0){ //新增打印
            ErrorL << " stop write event error _epoll_fd " << _epoll_fd << " fd " << fd << " events " << event << " ret " << ret << " " << strerror(errno);
        }
        cb(ret == 0);
        return ret;
#else

#endif // HAS_EPOLL
    }

    return 0;
}

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 5 months ago

There shouldn't be any problem. This fd may have already been closed. Don't worry about it.

应该没啥问题 这个fd可能已经被关闭了 不用理会

TRANS_BY_GITHUB_AI_ASSISTANT

HalcyonHuang commented 5 months ago

I find it strange too. Maybe I did something wrong. @xia-chu Is fd referring to the TCP client's socket? The TCP client's socket is not closed. When it receives data from the TCP server, it will trigger both read and write events. The log is as follows:

2024-04-19 18:27:28.602 D [test_tcpClient] [89747-stamp thread] util.cpp:366 operator() | Stamp thread started
2024-04-19 18:27:28.604 I [test_tcpClient] [89747-test_tcpClient] EventPoller.cpp:507 EventPollerPool | EventPoller created size: 2
2024-04-19 18:27:28.604 D [test_tcpClient] [89747-test_tcpClient] test_tcpClient.cpp:22 TestClient | 
2024-04-19 18:27:28.604 T [test_tcpClient] [89747-test_tcpClient] TcpClient.cpp:79 startConnect | TestClient-1 start connect 192.168.0.120:8082
2024-04-19 18:27:28.605 I [test_tcpClient] [89747-event poller 1] Socket.cpp:238 attachEvent | attachEvent rawFd 9 type0
2024-04-19 18:27:28.605 T [test_tcpClient] [89747-event poller 1] TcpClient.cpp:89 onSockConnect | TestClient-1 connect result: 0(success)
2024-04-19 18:27:28.605 I [test_tcpClient] [89747-event poller 1] test_tcpClient.cpp:30 onConnect | success
2024-04-19 18:27:28.605 E [test_tcpClient] [89747-event poller 1] Socket.cpp:262 operator() |  Event_Write rawFd 9 events 2
2024-04-19 18:27:28.605 E [test_tcpClient] [89747-event poller 1] EventPoller.cpp:188 modifyEvent |  stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument

接收到服务器发送的数据:
2024-04-19 18:27:43.562 E [test_tcpClient] [89747-event poller 1] Socket.cpp:258 operator() |  Event_Read rawFd 9 events 3
2024-04-19 18:27:43.563 D [test_tcpClient] [89747-event poller 1] test_tcpClient.cpp:34 onRecv | 123 from port:8082
2024-04-19 18:27:43.563 E [test_tcpClient] [89747-event poller 1] Socket.cpp:262 operator() |  Event_Write rawFd 9 events 3
2024-04-19 18:27:43.563 E [test_tcpClient] [89747-event poller 1] EventPoller.cpp:188 modifyEvent |  stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument

我也觉得奇怪,可能是我哪里没搞对吧。 @xia-chu fd 是指tcp客户端的socket吗,tcp客户端的socket没有关闭,接收到tcp服务器发来的数据,会同时触发读和写事件,日志如下:


2024-04-19 18:27:28.602 D [test_tcpClient] [89747-stamp thread] util.cpp:366 operator() | Stamp thread started
2024-04-19 18:27:28.604 I [test_tcpClient] [89747-test_tcpClient] EventPoller.cpp:507 EventPollerPool | EventPoller created size: 2
2024-04-19 18:27:28.604 D [test_tcpClient] [89747-test_tcpClient] test_tcpClient.cpp:22 TestClient | 
2024-04-19 18:27:28.604 T [test_tcpClient] [89747-test_tcpClient] TcpClient.cpp:79 startConnect | TestClient-1 start connect 192.168.0.120:8082
2024-04-19 18:27:28.605 I [test_tcpClient] [89747-event poller 1] Socket.cpp:238 attachEvent | attachEvent rawFd 9 type0
2024-04-19 18:27:28.605 T [test_tcpClient] [89747-event poller 1] TcpClient.cpp:89 onSockConnect | TestClient-1 connect result: 0(success)
2024-04-19 18:27:28.605 I [test_tcpClient] [89747-event poller 1] test_tcpClient.cpp:30 onConnect | success
2024-04-19 18:27:28.605 E [test_tcpClient] [89747-event poller 1] Socket.cpp:262 operator() |  Event_Write rawFd 9 events 2
2024-04-19 18:27:28.605 E [test_tcpClient] [89747-event poller 1] EventPoller.cpp:188 modifyEvent |  stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument

接收到服务器发送的数据: 2024-04-19 18:27:43.562 E [test_tcpClient] [89747-event poller 1] Socket.cpp:258 operator() | Event_Read rawFd 9 events 3 2024-04-19 18:27:43.563 D [test_tcpClient] [89747-event poller 1] test_tcpClient.cpp:34 onRecv | 123 from port:8082 2024-04-19 18:27:43.563 E [test_tcpClient] [89747-event poller 1] Socket.cpp:262 operator() | Event_Write rawFd 9 events 3 2024-04-19 18:27:43.563 E [test_tcpClient] [89747-event poller 1] EventPoller.cpp:188 modifyEvent | stop write event error _epoll_fd 8 fd 9 events 5 ret -1 Invalid argument



`TRANS_BY_GITHUB_AI_ASSISTANT`
PioLing commented 4 months ago

230, relate this and see?

230 ,关联这个看看?

TRANS_BY_GITHUB_AI_ASSISTANT

ss002012 commented 2 months ago

I encountered this issue while tracking down other problems. The initial conclusion was that it returned "Invalid argument," but it actually took effect.

Here is a portion of the code copied from ZLToolKit, which is used to implement stop writing when writable. I found that stopping writing did take effect, but this call still resulted in "Invalid argument."

#include <sys/epoll.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <netinet/in.h>
#include <iostream>
#include <string.h>
#include <strings.h>
#include <errno.h>
#include <arpa/inet.h>

using namespace std;
#define EPOLL_SIZE 1024
static int bind_sock4(int fd, const char *ifr_ip, uint16_t port) {
    struct sockaddr_in addr;
    bzero(&addr, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    if (1 != inet_pton(AF_INET, ifr_ip, &(addr.sin_addr))) {
        if (strcmp(ifr_ip, "::")) {
            cout << "inet_pton to ipv4 address failed: " << ifr_ip;
        }
        addr.sin_addr.s_addr = INADDR_ANY;
    }
    if (::bind(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
        cout << "Bind socket failed: " << strerror(errno);
        return -1;
    }
    return 0;
}

int main() {
    int _event_fd = epoll_create(EPOLL_SIZE);
    int fd = (int)socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
    int ul = 1;
    ioctl(fd, FIONBIO, &ul); 
    bind_sock4(fd, "0.0.0.0", 11000);

    struct epoll_event ev = {0};
    ev.events =  EPOLLIN | EPOLLOUT | EPOLLHUP | EPOLLERR | EPOLLET | EPOLLEXCLUSIVE ;
    ev.data.fd = fd;
    int ret = epoll_ctl(_event_fd, EPOLL_CTL_ADD, fd, &ev);

    struct epoll_event events[EPOLL_SIZE];
    while (true) {
        int ret = epoll_wait(_event_fd, events, EPOLL_SIZE, -1);
        if (ret <= 0) {
            //超时或被打断
            continue;
        }

        for (int i = 0; i < ret; ++i) {
            struct epoll_event &ev = events[i];
            int fd = ev.data.fd;

            cout << ev.events;
            if (ev.events & (EPOLLIN | EPOLLRDNORM | EPOLLHUP)) {
                cout << "可读";
            }

            if (ev.events & (EPOLLOUT | EPOLLWRNORM)) {
                cout << "可写";

                // 停止写监听
                struct epoll_event ev = { 0 };
                ev.events = EPOLLIN | EPOLLHUP | EPOLLERR | EPOLLET;
                ev.data.fd = fd;
                int ret = epoll_ctl(_event_fd, EPOLL_CTL_MOD, fd, &ev);
                if (ret != 0) {
                    cout << strerror(errno) << endl;
                }
            }

            if (ev.events & (EPOLLHUP | EPOLLERR)) {
                cout << "错误";
            }
        }
    }
}

The running result is as follows: image

我在追踪其他问题时也遇到了这个问题,初步结论是返回Invalid argument,但是实际上是生效的。 下面是从ZLToolKit摘抄的一部分代码,用来实现可写时停止可写,我发现停止可写的确生效了,但这一次调用仍然出现Invalid argument。

#include <sys/epoll.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <netinet/in.h>
#include <iostream>
#include <string.h>
#include <strings.h>
#include <errno.h>
#include <arpa/inet.h>

using namespace std;
#define EPOLL_SIZE 1024
static int bind_sock4(int fd, const char *ifr_ip, uint16_t port) {
    struct sockaddr_in addr;
    bzero(&addr, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    if (1 != inet_pton(AF_INET, ifr_ip, &(addr.sin_addr))) {
        if (strcmp(ifr_ip, "::")) {
            cout << "inet_pton to ipv4 address failed: " << ifr_ip;
        }
        addr.sin_addr.s_addr = INADDR_ANY;
    }
    if (::bind(fd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
        cout << "Bind socket failed: " << strerror(errno);
        return -1;
    }
    return 0;
}

int main() {
    int _event_fd = epoll_create(EPOLL_SIZE);
    int fd = (int)socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
    int ul = 1;
    ioctl(fd, FIONBIO, &ul); 
    bind_sock4(fd, "0.0.0.0", 11000);

    struct epoll_event ev = {0};
    ev.events =  EPOLLIN | EPOLLOUT | EPOLLHUP | EPOLLERR | EPOLLET | EPOLLEXCLUSIVE ;
    ev.data.fd = fd;
    int ret = epoll_ctl(_event_fd, EPOLL_CTL_ADD, fd, &ev);

    struct epoll_event events[EPOLL_SIZE];
    while (true) {
        int ret = epoll_wait(_event_fd, events, EPOLL_SIZE, -1);
        if (ret <= 0) {
            //超时或被打断
            continue;
        }

        for (int i = 0; i < ret; ++i) {
            struct epoll_event &ev = events[i];
            int fd = ev.data.fd;

            cout << ev.events;
            if (ev.events & (EPOLLIN | EPOLLRDNORM | EPOLLHUP)) {
                cout << "可读";
            }

            if (ev.events & (EPOLLOUT | EPOLLWRNORM)) {
                cout << "可写";

                // 停止写监听
                struct epoll_event ev = { 0 };
                ev.events = EPOLLIN | EPOLLHUP | EPOLLERR | EPOLLET;
                ev.data.fd = fd;
                int ret = epoll_ctl(_event_fd, EPOLL_CTL_MOD, fd, &ev);
                if (ret != 0) {
                    cout << strerror(errno) << endl;
                }
            }

            if (ev.events & (EPOLLHUP | EPOLLERR)) {
                cout << "错误";
            }
        }
    }
}

运行结果如下: image

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

This issue was found to be caused by the EPOLLEXCLUSIVE flag.

这个问题发现时EPOLLEXCLUSIVE标志导致的

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

Everything is normal after removing it.

去掉就一切正常了

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

image

ss002012 commented 2 months ago

But isn't EPOLLEXCLUSIVE supposed to reduce the "thundering herd" problem and improve performance? Shouldn't our tcpserver/udpserver both be using it?

但是EPOLLEXCLUSIVE不是有减少惊群,提升性能的作用吗? 我们的tcpserver/udpserver应该都用到了吧。

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

EPOLLEXCLUSIVE seems to be effective for multiple fds listening to the same port or multiple epolls listening to the same fd. I haven't tested it yet.

EPOLLEXCLUSIVE貌似对多fd监听同一个端口 或者多epoll监听同一个fd有效 我还没测试

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

Actual testing shows that EPOLLEXCLUSIVE is effective in preventing the "thundering herd" problem when multiple epoll threads listen to a single fd. After enabling it, multiple epoll threads can indeed avoid triggering the accept event simultaneously.

However, EPOLLEXCLUSIVE is ineffective for multiple fds on a single port. Actual testing shows that when multiple fds listen to the same UDP port, there is no "thundering herd" phenomenon.

实测EPOLLEXCLUSIVE对多个epoll线程监听一个fd避免惊群有效,开启后确实可以避免多个epoll线程同时触发accept事件。 但是EPOLLEXCLUSIVE对一个端口多个fd无效,实测多个fd监听同一个udp端口时,都不会有惊群现象

TRANS_BY_GITHUB_AI_ASSISTANT

xia-chu commented 2 months ago

To summarize, removing EPOLLEXCLUSIVE only slightly affects the performance of zlm accept, with a very small impact. For the correctness of the program, it is still necessary.

总结下 EPOLLEXCLUSIVE去除后,只稍微影响zlm accept的性能,影响面非常小,为了程序正确性,还是有必要的。

TRANS_BY_GITHUB_AI_ASSISTANT

ss002012 commented 2 months ago

实测EPOLLEXCLUSIVE对多个epoll线程监听一个fd避免惊群有效,开启后确实可以避免多个epoll线程同时触发accept事件。 但是EPOLLEXCLUSIVE对一个端口多个fd无效,实测多个fd监听同一个udp端口时,都不会有惊群现象

Great!

实测EPOLLEXCLUSIVE对多个epoll线程监听一个fd避免惊群有效,开启后确实可以避免多个epoll线程同时触发accept事件。 但是EPOLLEXCLUSIVE对一个端口多个fd无效,实测多个fd监听同一个udp端口时,都不会有惊群现象

赞!

TRANS_BY_GITHUB_AI_ASSISTANT