Open kuailelijuan opened 2 years ago
That is an interesting situation, indeed. I am not sure, however, that CRIU can do anything about. This could be seen as a Java feature or a bug. Java assumes if two hours passed since the last check it has to run all missing runs. It probably should not.
Working on CRIU I am biased but this sounds like a thing Java needs to solve itself. You would probably have the same problem on a virtual machine that is paused for two hours and on unpause the clock jumps to the correct time. Java should not expect that the clock cannot jump.
@adrianreber It is an honor to receive your answer. I recently encountered these problems when experimenting with criu. Hope you can give me some guidance and advice. 1. During the process of live migration, it is likely to encounter the problems of the above java time schedule tasks. In addition, some changes in the physical state of the target process ip, as well as the cached data of the application process, sessionID, etc., related to the process logic may not work properly. In order to use the criu , additional development is required. Usually, these status points are difficult to be completely sorted out, and the workload of transformation is huge. How can the application system enjoy the convenience of criu at a lower cost. 2. What will be the future development trend and planning of criu. What areas will his application scenarios be in?
I don't think this is the right place to discuss such broad open questions. If you have a specific problem we are happy to help if possible but your question seems to be about almost everything related to CRIU.
@kuailelijuan what kernel do you use? Does it support time namespaces? What clocks is used to implement newSingleThreadScheduledExecutor? You can find this if you run your app under "strace -fo strace.log -s 1024".
I don't think this is the right place to discuss such broad open questions. If you have a specific problem we are happy to help if possible but your question seems to be about almost everything related to CRIU.
Thanks a lot for your answer. There is a problem I don't understand, why the java schedule timed task will not take effect during the process of hot migration. This is an accidental phenomenon, and not every node will appear. I am experimenting in some docker container, docker version: 17.03.2 The same node, if his schedule does not take effect, after I kill it and try again many times, it will not take effect also. On the contrary, if one node, the schedule can be executed normally, and it is normal to re-operate many times.
We are no experts how Java implements the scheduled task. It seems to look at the date and time and if you migrate to a system which has a clock in the past then it might skip a few scheduled tasks. You need to ensure that clocks are properly synced if your process depends on the wallclock. Again, this is a topic you need to talk to Java experts about it. We can only speculate.
For time sensitive migrations time namespaces can be a solution, but so far no container runtime is really using them. CRIU has the necessary support.
@kuailelijuan what kernel do you use? Does it support time namespaces? What clocks is used to implement newSingleThreadScheduledExecutor? You can find this if you run your app under "strace -fo strace.log -s 1024".
kernel : Linux version 4.18.0 i execute "strace -fo strace.log -s 1024 " get the following log :
10354 16:41:43 restart_syscall(<... resuming interrupted call ...> <unfinished ...>
10354 16:42:04 <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
10354 16:42:04 futex(0x7f0e6c002eb8, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:04 futex(0x7f0e6c002c54, FUTEX_WAIT_BITSET_PRIVATE, 17241, {9351494, 549672793}, ffffffff <unfinished ...>
10354 16:42:04 <... futex resumed> ) = 0
10354 16:42:04 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:04 futex(0x7f0e6c002c54, FUTEX_WAIT_PRIVATE, 17243, NULL <unfinished ...>
10354 16:42:04 <... futex resumed> ) = 0
10354 16:42:04 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:04 futex(0x7f0e6c002c54, FUTEX_WAIT_PRIVATE, 17245, NULL <unfinished ...>
10354 16:42:04 <... futex resumed> ) = 0
10354 16:42:04 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,166 com.my.timer.Controller$2.run(TimerController.java:479) [185339417] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:42:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,166 com.my.timer.Controller$2.run(TimerController.java:479) [185339417] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:42:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,167 com.my.timer.Controller$2.run(TimerController.java:480) [185339418] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:42:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,167 com.my.timer.Controller$2.run(TimerController.java:480) [185339418] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:42:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,167 com.my.timer.Controller$2.run(TimerController.java:481) [185339418] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:42:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:04,167 com.my.timer.Controller$2.run(TimerController.java:481) [185339418] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:42:04 futex(0x7f0e6c002ee4, FUTEX_WAIT_BITSET_PRIVATE, 1, {9351524, 550045461}, ffffffff <unfinished ...>
10354 16:42:34 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
10354 16:42:34 futex(0x7f0e6c002eb8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
10354 16:42:34 <... futex resumed> ) = 0
10354 16:42:34 futex(0x7f0e6c002c54, FUTEX_WAIT_BITSET_PRIVATE, 17247, {9351524, 551365667}, ffffffff <unfinished ...>
10354 16:42:34 <... futex resumed> ) = 0
10354 16:42:34 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:34 futex(0x7f0e6c002c54, FUTEX_WAIT_PRIVATE, 17249, NULL <unfinished ...>
10354 16:42:34 <... futex resumed> ) = 0
10354 16:42:34 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
10354 16:42:34 <... futex resumed> ) = 0
10354 16:42:34 futex(0x7f0e6c002c54, FUTEX_WAIT_PRIVATE, 17251, NULL <unfinished ...>
10354 16:42:34 <... futex resumed> ) = 0
10354 16:42:34 futex(0x7f0e6c002c28, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:42:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,168 com.my.timer.Controller$2.run(TimerController.java:479) [185369419] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:42:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,168 com.my.timer.Controller$2.run(TimerController.java:479) [185369419] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:42:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,170 com.my.timer.Controller$2.run(TimerController.java:480) [185369421] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:42:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,170 com.my.timer.Controller$2.run(TimerController.java:480) [185369421] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:42:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,170 com.my.timer.Controller$2.run(TimerController.java:481) [185369421] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:42:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:42:34,170 com.my.timer.Controller$2.run(TimerController.java:481) [185369421] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:42:34 futex(0x7f0e6c002ee4, FUTEX_WAIT_BITSET_PRIVATE, 1, {9351554, 552746621}, ffffffff <unfinished ...>
10354 16:43:04 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
10354 16:43:04 futex(0x7f0e6c002eb8, FUTEX_WAKE_PRIVATE, 1) = 0
10354 16:43:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,170 com.my.timer.Controller$2.run(TimerController.java:479) [185399421] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:43:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,170 com.my.timer.Controller$2.run(TimerController.java:479) [185399421] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:43:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,171 com.my.timer.Controller$2.run(TimerController.java:480) [185399422] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:43:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,171 com.my.timer.Controller$2.run(TimerController.java:480) [185399422] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:43:04 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,171 com.my.timer.Controller$2.run(TimerController.java:481) [185399422] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:43:04 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:04,171 com.my.timer.Controller$2.run(TimerController.java:481) [185399422] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:43:04 futex(0x7f0e6c002ee4, FUTEX_WAIT_BITSET_PRIVATE, 1, {9351584, 553452477}, ffffffff <unfinished ...>
10354 16:43:34 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)
10354 16:43:34 futex(0x7f0e6c002eb8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
10354 16:43:34 <... futex resumed> ) = 0
10354 16:43:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,171 com.my.timer.Controller$2.run(TimerController.java:479) [185429422] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:43:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,171 com.my.timer.Controller$2.run(TimerController.java:479) [185429422] INFO - ch.web.action.TimerController - YPF WithFixedDelay scheduled service! \n", 218) = 218
10354 16:43:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,171 com.my.timer.Controller$2.run(TimerController.java:480) [185429422] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:43:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,171 com.my.timer.Controller$2.run(TimerController.java:480) [185429422] INFO - ch.web.action.TimerController - Thread[YPF-WithFixedDelay-15-T-1,5,main] \n", 221) = 221
10354 16:43:34 write(1, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,172 com.my.timer.Controller$2.run(TimerController.java:481) [185429423] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:43:34 write(62, "YPF-WithFixedDelay-15-T-1 2022-01-14 16:43:34,172 com.my.timer.Controller$2.run(TimerController.java:481) [185429423] INFO - ch.web.action.TimerController - 87 is run. \n", 191) = 191
10354 16:43:34 futex(0x7f0e6c002ee4, FUTEX_WAIT_BITSET_PRIVATE, 1, {9351614, 554179247}, ffffffff <unfinished ...>
304 16:43:36 futex(0x7f0f80114154, FUTEX_WAIT_BITSET_PRIVATE, 1, {9351587, 201035457}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
We are no experts how Java implements the scheduled task. It seems to look at the date and time and if you migrate to a system which has a clock in the past then it might skip a few scheduled tasks. You need to ensure that clocks are properly synced if your process depends on the wallclock. Again, this is a topic you need to talk to Java experts about it. We can only speculate.
For time sensitive migrations time namespaces can be a solution, but so far no container runtime is really using them. CRIU has the necessary support.
thanks.
so far no container runtime is really using them.
crun does support time namespace (https://github.com/containers/crun/pull/257) and container checkpoint/restore, but Adrian is right. So far we have not seen many discussions about it.
A friendly reminder that this issue had no activity for 30 days.
For time sensitive migrations time namespaces can be a solution, but so far no container runtime is really using them.
It isn't required. CRIU can restore processes in a new time namespace by default if the kernel allows doing that.
A friendly reminder that this issue had no activity for 30 days.
Program description: jdk version:jdk1.8.0_60 key code:
problem: In this java process, two time tasks are started, using two methods: scheduleAtFixedRate and scheduleWithFixedDelay. I found two problems.