Closed lvxiao1 closed 2 years ago
This happens when resources are insufficient.
What is the ulimit
root@e125c400e345:/data/log# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 39834 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
@heyanlong
我调试发现是boost::interprocess::message_queue
被删除才会报sky_request_flush message_queue exNo such file or directory
, 但是sky_module_cleanup方法没有被调用过,什么情况下这个队列会被删除呢? 还有我reload fpm之后加几倍的压力进行压测也没出现这种情况
正常来说,queue不会莫名其妙被删除
我在创建、删除和打开message_queue
加了日志, 正常上报两次之后就会报错, 查看了下/dev/shm skywalking_queue_12 确实被删除了
看来需要添加个重新创建机制。。要不要提交个pr?
@heyanlong 可以,但莫名其妙被删除, 如果加上重新创建机制,会不会一直删除重建,不断申请和释放共享内存而影响性能
目前问题已经排查出来了,由于nginx配置了fastcgi_cache_path /dev/shm
,导致nginx回收时把 /dev/shm
下的共享内存全清空了。最终配置成 fastcgi_cache_path /dev/shm/cache
解决问题
@heyanlong 看来需要添加个重新创建机制。。要不要提交个pr?
消费者的接收是阻塞的,但message_queue
不会监听shm回收,并且message_queue.remove
方法也不会notify
,所以会导致无限期的等待下去。
是否可以换成timed_receive
, 设置成一秒超时, 如果返回true就继续读取,返回false
则重新打开message_queue
, 捕获到not_such_file_or_directory
异常重新创建队列
提交个pr吧
系统信息:
用ab压测开始时能正常上报数据, 一段时间sky无法接收到数据,查看sdk日志报错
sky_request_flush message_queue exNo such file or directory
,但是重启fpm后问题不会再重现, 重启命令ps -ef|grep php-fpm|grep master|grep -v grep|awk '{print $2}'|xargs kill -USR2
ab -n 10000 -c 50 http://127.0.0.1:18880/redis.php
/usr/local/sbin/php-fpm -R --nodaemonize
redis.php 文件
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib:/usr/local/lib64 ENV LD_RUN_PATH=$LD_RUN_PATH:/usr/local/lib:/usr/local/lib64 ENV WORKDIR /webser/www ENV GROUP_NAME test ENV SERVICE_NAME skywalking WORKDIR /webser/www
RUN sed -i "s/deb.debian.org/mirrors.163.com/g" /etc/apt/sources.list \ && sed -i "s/security.debian.org/mirrors.163.com/g" /etc/apt/sources.list \ && apt-get clean \ && apt-get update --fix-missing \ && apt-get install -y build-essential autoconf automake libtool curl make g++ unzip pkg-config cmake libboost-all-dev libcurl4-openssl-dev zlib1g-dev nginx git \ && docker-php-ext-install zip \ && pecl channel-update pecl.php.net \ && pecl install redis \ && docker-php-ext-enable redis
RUN git clone --depth 1 -b v1.34.x https://github.com/grpc/grpc.git /var/local/git/grpc \ && cd /var/local/git/grpc \ && git submodule update --init --recursive \ && mkdir -p cmake/build \ && cd cmake/build \ && cmake ../.. \ && make -j$(nproc) \ && echo "--- INSTALL skywalking php ---" \ && cd /var/local/git \ && curl -Lo v4.2.0.tar.gz https://github.com/SkyAPM/SkyAPM-php-sdk/archive/v4.2.0.tar.gz \ && tar zxvf v4.2.0.tar.gz \ && cd SkyAPM-php-sdk-4.2.0 \ && phpize && ./configure --with-grpc=/var/local/git/grpc && make && make install \ && rm -fr /var/local/git \ && ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \ && mkdir -pv /data/log /var/log/php /webser/www /var/tmp/nginx
COPY php.ini /usr/local/etc/php/conf.d/ COPY www.conf /usr/local/etc/php-fpm.d/ COPY nginx.conf /etc/nginx/ COPY app.conf /etc/nginx/conf.d/app.conf
COPY run.sh /tmp/ COPY reload-php-ini.sh /tmp/
RUN chmod +x /tmp/run.sh \ && chmod +x /tmp/reload-php-ini.sh
EXPOSE 18880 ENTRYPOINT ["/tmp/run.sh"]
pm = static pm.max_children = 5 slowlog = /data/php-slow.log request_slowlog_timeout = 5s request_terminate_timeout = 20s
clear_env = no catch_workers_output = yes php_admin_flag[expose_php] = off
[global] error_log = /data/php-error.log