Closed Peter-J-Jansen closed 6 years ago
Trying to help reasearching this mysterious wait, I've been experimenting with the Ubuntu "top" command.
Whilst the wait is going on the top command shows this, with field WCHAN added:
top - 15:39:21 up 23:24, 1 user, load average: 0.19, 0.51, 0.42
Threads: 87 total, 0 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.4 us, 0.1 sy, 0.1 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16106100 total, 12838180 free, 1103896 used, 2164024 buff/cache
KiB Swap: 8189948 total, 8189948 free, 0 used. 14597816 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND WCHAN
19544 root -9 0 12.776g 21416 5780 S 4.7 0.1 0:05.13 timer_thread hrtimer_nanosleep
19540 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.62 impl_thread random_read.part.34
19541 root -6 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 logger_thread pipe_wait
19542 root -3 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 watchdog_thread hrtimer_nanosleep
19543 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP00 futex_wait_queue_me
19545 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP01 futex_wait_queue_me
19546 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP02 futex_wait_queue_me
19547 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP03 futex_wait_queue_me
19548 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP04 futex_wait_queue_me
19549 root -4 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 Processor CP05 futex_wait_queue_me
19550 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 shared_server poll_schedule_timeout
19551 root -6 0 12.776g 21416 5780 S 0.0 0.1 0:00.02 console_connect poll_schedule_timeout
19554 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 tun_do_read.part.50
19591 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 CTCE 0E40 Liste inet_csk_accept
19592 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 CTCE 0E41 Liste inet_csk_accept
19593 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 CTCE 0E42 Liste inet_csk_accept
19594 root -7 0 12.776g 21416 5780 S 0.0 0.1 0:00.00 CTCE 0E43 Liste inet_csk_accept
After the wait has mysteriously ended, noting that panel_display has replaced impl_thread :
top - 15:44:52 up 23:30, 1 user, load average: 0.43, 0.37, 0.37
Threads: 87 total, 0 running, 87 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.3 sy, 0.1 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16106100 total, 12793104 free, 1149004 used, 2163992 buff/cache
KiB Swap: 8189948 total, 8189948 free, 0 used. 14552748 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND WCHAN
19544 root -9 0 12.839g 21472 5780 S 5.3 0.1 0:20.34 timer_thread hrtimer_nanosleep
19540 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.62 panel_display poll_schedule_timeout
19541 root -6 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 logger_thread pipe_wait
19542 root -3 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 watchdog_thread hrtimer_nanosleep
19543 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP00 futex_wait_queue_me
19545 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP01 futex_wait_queue_me
19546 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP02 futex_wait_queue_me
19547 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP03 futex_wait_queue_me
19548 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP04 futex_wait_queue_me
19549 root -4 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 Processor CP05 futex_wait_queue_me
19550 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 shared_server poll_schedule_timeout
19551 root -6 0 12.839g 21472 5780 S 0.0 0.1 0:00.10 console_connect poll_schedule_timeout
19554 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 tun_do_read.part.50
19591 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 CTCE 0E40 Liste inet_csk_accept
19592 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 CTCE 0E41 Liste inet_csk_accept
19593 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 CTCE 0E42 Liste inet_csk_accept
19594 root -7 0 12.839g 21472 5780 S 0.0 0.1 0:00.00 CTCE 0E43 Liste inet_csk_accept
I hope this may perhaps give some clue as to what the problem could be.
Cheers,
Peter
Thank you for your preliminary analysis, Peter! Adding the WCHAN
field to your top
display was a pure stroke of genius! (in my opinion)
After a quick Google search on "random_read.part" (impl_thread) and seeing the words "entropy", "urandom" and "secret" appear in the only two search results that Google displayed, I am now 99.99% sure I know what is causing the delay! :)
Let me do a quick test to confirm my suspicion and I'll get back to you with my results!
BINGO! :)))
The fix for this issue will be committed shortly!
Closed by commit e406feb212265230730a7c3079f29935507acf4a!
If you review the patch (the commit) it should hopefully be apparent what was causing the delay. ;-)
I should note too that this bug was unfortunately introduced due to my own inexperience with Linux. I apologize to the community for that. :(
I research things as best I can and then implement what I believe to be the correct code based on that research, but I sometimes get it wrong. This probably won't be the last time something like this happens either, and all I can say is that I hope you all will forgive me when it does happen, and help me to try and resolve whatever issue it may be. Thanks for your understanding.
Absolutely super Fish! I also did a Google search but to me the results meant nothing.
I already sent an e-mail to the original person who experienced the wait, thinking it was caused by the CTCE device as that was the last he saw in the Hercules log/console, asking him to re-build and test and let us know.
Cheers,
Peter
It has been observed that sometimes (i.e. not always), during SDL-Hyperion startup on Linux, towards the end of the configuration file processing, but before the panel displays, a wait can occur, lasting a few minutes (or longer or shorter). This wait may resolve itself. The current workaround is to try waiting it out, or do a quit, and retry to start again.(*)
(*) Or... clicking the mouse outside the terminal window where Hercules is running, then clicking inside the window, then outside, then inside, etc. After a few mouse clicks, the problem seems to resolve itself within a few seconds. This is on my CentOS 6.10 VMware virtual machine. -- Fish.