Open jokemanfire opened 5 months ago
I found another two problem, when use fifo directly.
This is a method to get this error. 1、Get a image Dockerfile like this:
FROM busybox:latest
COPY test.sh /
ENTRYPOINT ["sh","/test.sh"]
test.sh is blow this:
while true; do
sleep 3
echo "hello"
result=$?
if [ $result -ne 0 ]; then
date >> log.txt
echo "echo faile . Result : $result" >> /log.txt
fi
done
docker build get this image. use ctr import this image. 2、run a container then use rshim to run a container. 3、get this error stop containerd service . you can see the error message in this container. but go shim will not be influenced. So I think use a pipe in shim may be completely needed. This pr which I test can resolve this problem #278
friendly ping , @fuweid @mxpv @Burning1020 . Looking forward to your reply.
tokio 1.40 pipe can resolve pipe problem perfect. friendly ping , @fuweid @mxpv @Burning1020
Hi @jokemanfire , would you please file pull request to fix this? thanks
@fuweid Please have a check #278
tokio 1.40 pipe can resolve pipe problem perfect. friendly ping , @fuweid @mxpv @Burning1020
Hi @jokemanfire can you give more detail about why "tokio 1.40 pipe can resolve pipe problem perfect" ?
I have also encountered similar problem as you found: "when containerd service is stop , all rshim io will broken, but not go shim.", and I found another problem: the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly
I am following up on this issue, please give some updates, Thanks !
the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly
This problem ,I didn't meet. Is there some method to get this problem? Use FIFO directly , will cause some problems , and the problem can learn from https://fuweid.com/post/2022-embedshim-kernel-is-my-sidecar/ . Thanks @fuweid . There 's some describe like " embedshim 同样也采用中转的方式来处理标准输入,但它直接将读写模式的有名管道交给了容器的标准输出,减少标准输出的拷贝。embedshim 插件属于 containerD 进程的一部分,一旦 containerD 重启,那么容器进程的 输入端 将收到 SIGPIPE 错误。对于这种情况,个人觉得是可以接受的。在交互模式下,用户会感知到容器引擎的停服。而线上环境的大部分场景都是采用 Headless 无交互模式,容器进程的输入端都是 /dev/null,而标准输出的状态由有名管道做持久化,不会因为 containerD 停服而出现 容器输出端 的 SIGPIPE 错误。 " I want to change FIFO to pipe, because some problems I think which is unacceptable in Rustshim. And change the 'pipe_os' to 'tokio_pipe', because the async trait which under high concurrency IO will cause the tokio_copy spwan will be residual.(I think it caused by the raw_fd, and there is a problem with implementing the Asynchronous trait) The Rustshim can't be delete successful.If there are some replication methods here, I would be happy to determine if the problem is caused by FIFO IO.
the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly
This problem ,I didn't meet. Is there some method to get this problem?
I didn't do any special thing before i encounter this problem, I have a program with high frequency log out, and when I follow logs via crictl logs -f xxx
I got very long delay between intermittent output, after some investigating i found that log file produced from containerd-cri also intermittent, I guess some abnormal thing from new way of using FIFO or rust tokio runtime.
Simple diagram:
Go shim: |fifo reader| <-- fifo --> |io copier| <-- pipe --> |container process| Rust shim: |fifo reader| <-- fifo --> |container process|
The fifo and fifo reader are from containerd-cri and have no difference, i guess problem comes from second half
the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly
I think maybe I've found the cause. I'll try to file a PR about it later.
Seeing level=error msg="copy io failed Input/output error (os error 5)"
when running this, could this be related?
copy io failed Input/output
If you patched #278 ? If yes, Could you provide a more detailed description or some logs . For checking if it is my patch's problem. Ps: binary io is not realize, nerdctl -t -d will fail.
Not patched. I will patch and try again.
Patched, same error, so not fixed with #278
Patched, same error, so not fixed with #278
Could you support the debug log? It may caused by copy_console (tty) , there is no more information, so it cannot be determined.
This is what I could see already, any idea? I'll look at it more closely on Monday
This is what I could see already, any idea? I'll look at it more closely on Monday
I think in the spawn_copy while the read/write side closed suddenly, it may print this. You can check it , it should occur in tokio_copy.
Related I have told this question to containerd . But looks like containerd will not change. So I will take a pr to change fifo to pipe. I have complete this code , after some ci test ,I will submit this pr.