Hyperfoil / qDup

Lab automation and queuing scripting
Apache License 2.0
12 stars 12 forks source link

Running qDup hangs #178

Open galderz opened 1 year ago

galderz commented 1 year ago

Other times running helloworld example just hangs, e.g.

"Thread-2" #19 daemon prio=5 os_prio=31 cpu=1.57ms elapsed=40.46s tid=0x000000012b84ee00 nid=0xa603 runnable  [0x000000017156e000]
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(java.base@17.0.6-internal/Native Method)
    at java.io.FileInputStream.read(java.base@17.0.6-internal/FileInputStream.java:276)
    at java.io.BufferedInputStream.read1(java.base@17.0.6-internal/BufferedInputStream.java:282)
    at java.io.BufferedInputStream.read(java.base@17.0.6-internal/BufferedInputStream.java:343)
    - locked <0x000000061cc17288> (a java.io.BufferedInputStream)
    at sun.nio.cs.StreamDecoder.readBytes(java.base@17.0.6-internal/StreamDecoder.java:270)
    at sun.nio.cs.StreamDecoder.implRead(java.base@17.0.6-internal/StreamDecoder.java:313)
    at sun.nio.cs.StreamDecoder.read(java.base@17.0.6-internal/StreamDecoder.java:188)
    - locked <0x000000061cc17bc0> (a java.io.InputStreamReader)
    at java.io.Reader.read(java.base@17.0.6-internal/Reader.java:197)
    at java.io.InputStreamReader.read(java.base@17.0.6-internal/InputStreamReader.java:157)
    at java.util.Scanner.readInput(java.base@17.0.6-internal/Scanner.java:882)
    at java.util.Scanner.findWithinHorizon(java.base@17.0.6-internal/Scanner.java:1796)
    at java.util.Scanner.hasNextLine(java.base@17.0.6-internal/Scanner.java:1610)
    at io.hyperfoil.tools.qdup.JsonServer.lambda$start$1(JsonServer.java:199)
    at io.hyperfoil.tools.qdup.JsonServer$$Lambda$302/0x0000000800ec5eb8.run(Unknown Source)
    at java.lang.Thread.run(java.base@17.0.6-internal/Thread.java:833)
...
"qdup-command-0" #22 prio=5 os_prio=31 cpu=3.44ms elapsed=40.40s tid=0x000000013cc25400 nid=0x8703 waiting on condition  [0x0000000171d9e000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.6-internal/Native Method)
    - parking to wait for  <0x0000000603469808> (a java.util.concurrent.FutureTask)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.6-internal/LockSupport.java:211)
    at java.util.concurrent.FutureTask.awaitDone(java.base@17.0.6-internal/FutureTask.java:447)
    at java.util.concurrent.FutureTask.get(java.base@17.0.6-internal/FutureTask.java:190)
    at java.util.concurrent.AbstractExecutorService.invokeAll(java.base@17.0.6-internal/AbstractExecutorService.java:252)
    at io.hyperfoil.tools.qdup.cmd.Dispatcher.invokeAll(Dispatcher.java:317)
    at io.hyperfoil.tools.qdup.Run.queueRunScripts(Run.java:852)
    at io.hyperfoil.tools.qdup.Run.nextStage(Run.java:300)
    at io.hyperfoil.tools.qdup.Run.postStop(Run.java:258)
    at io.hyperfoil.tools.qdup.cmd.Dispatcher.lambda$checkActiveCount$14(Dispatcher.java:462)
    at io.hyperfoil.tools.qdup.cmd.Dispatcher$$Lambda$375/0x0000000800f1d6f0.accept(Unknown Source)
    at java.lang.Iterable.forEach(java.base@17.0.6-internal/Iterable.java:75)
    at io.hyperfoil.tools.qdup.cmd.Dispatcher.lambda$checkActiveCount$15(Dispatcher.java:462)
    at io.hyperfoil.tools.qdup.cmd.Dispatcher$$Lambda$374/0x0000000800f1d4c8.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.6-internal/ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.6-internal/ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(java.base@17.0.6-internal/Thread.java:833)

"qdup-command-1" #23 prio=5 os_prio=31 cpu=409.77ms elapsed=40.39s tid=0x000000013c995400 nid=0xa103 waiting on condition  [0x0000000171faa000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.6-internal/Native Method)
    - parking to wait for  <0x00000006029a3f98> (a java.util.concurrent.Semaphore$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(java.base@17.0.6-internal/LockSupport.java:211)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.6-internal/AbstractQueuedSynchronizer.java:715)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.6-internal/AbstractQueuedSynchronizer.java:1047)
    at java.util.concurrent.Semaphore.acquire(java.base@17.0.6-internal/Semaphore.java:318)
    at io.hyperfoil.tools.qdup.SshSession.shConnecting(SshSession.java:824)
    at io.hyperfoil.tools.qdup.SshSession.connect(SshSession.java:618)
    at io.hyperfoil.tools.qdup.SshSession.<init>(SshSession.java:285)
    at io.hyperfoil.tools.qdup.Run.lambda$queueRunScripts$18(Run.java:813)
    at io.hyperfoil.tools.qdup.Run$$Lambda$382/0x0000000800f1f098.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(java.base@17.0.6-internal/FutureTask.java:264)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.6-internal/ThreadPoolExecutor.java:1136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.6-internal/ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(java.base@17.0.6-internal/Thread.java:833)

"sshd-SshClient[7ffc78f9]-timer-thread-1" #24 daemon prio=5 os_prio=31 cpu=18.92ms elapsed=40.15s tid=0x000000013c995a00 nid=0x8903 waiting on condition  [0x00000001721b6000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@17.0.6-internal/Native Method)
    - parking to wait for  <0x0000000602f35200> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.6-internal/LockSupport.java:252)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.6-internal/AbstractQueuedSynchronizer.java:1672)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.6-internal/ScheduledThreadPoolExecutor.java:1182)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.6-internal/ScheduledThreadPoolExecutor.java:899)
    at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@17.0.6-internal/ThreadPoolExecutor.java:1062)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.6-internal/ThreadPoolExecutor.java:1122)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.6-internal/ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(java.base@17.0.6-internal/Thread.java:833)

Full thread dump here.

This has been observed with both macOS and fedora clients on fedora server. Java version 17.

johnaohara commented 1 year ago

After enabling trace logging, I am seeing that executing this command hangs qDup on machines that are not using bash/sh for the terminal: https://github.com/Hyperfoil/qDup/blob/master/src/main/java/io/hyperfoil/tools/qdup/SshSession.java#L618

On a target machine using bash, the response buffer content from the remote machine returns are something like: ^.........[?2004h[user@localhost..]$. , whereas for non-bash shells, the response buffer contains ^.........[0m.[49m.[39m.[27m.[24m.[1m.[7m%.[27m.[1m.[0m.........

qDup can not decode the response from the remote machine correctly

The remote machine that caused the hang was running https://github.com/romkatv/powerlevel10k on top of zsh, using zsh also causes qDup to hang

zakkak commented 9 months ago

I have had a similar issue in the past with Emacs in tramp mode and the work around there is to set a different prompt depending on the value of the TERM environment variable.

For instance I have the following in my .zshrc

case "$TERM" in
    xterm*|rxvt*|eterm*|screen*|st-256color*)
        # PS1="my fancy multi-line prompt > "
        ;;
    *)
        PS1="> "
        ;;
esac

Which sets the prompt to > when TERM doens't match any of the xterm*|rxvt*|eterm*|screen*|st-256color*.

Is there a way in qDup to set an environment variable for the ssh session? It could help as a workaround till this gets fixes.

zakkak commented 9 months ago

After having a look at the code, I now understand that qDup is setting the prompt itself so the above comment is irrelevant, but I think I figured out the actual issue.

The prompt used by qDup, <_#%@_qdup_@%#_>, is subject to expansion by default in zsh resulting in a prompt like <_#10:02_qdup_@%_>, as %@ is being expanded to the current time of the day [1]. This results in qDup failing to find the suffix in the SuffixStream, thus waiting forever.

Disabling the expansion when zsh is detected should allow qDup to set the prompt as expected. To do so one needs to run setopt NO_PROMPT_SUBST; setopt NO_PROMPT_PERCENT; setopt NO_PROMPT_BANG;

[1] https://zsh.sourceforge.io/Doc/Release/Prompt-Expansion.html#Date-and-time

willr3 commented 9 months ago

I think we should change the prompt to something that does not expand by default. We picked <_#%@_qdup_@%#_> because it stood out when scanning the run.log and because it was unlikely to occur as output from other shell interaction. I have to admit, i don't use zsh and we have not run tests on zsh in the past but I will see if I can recreate this in a container.

willr3 commented 9 months ago

@zakkak Would changing the prompt to <_#_qdup_#_> trigger any default completion in zsh? We still need to address the additional setopt commands but I would like to create a release that has a functional prompt while we explore better ways to support zsh.

zakkak commented 9 months ago

@zakkak Would changing the prompt to <_#_qdup_#_> trigger any default completion in zsh?

Hi @willr3 , it seems to not trigger any default completion/expansions (AFAIK expansions are prefixed with %).

❯ export PS1='<_#_qdup_#_> '; set +o history; export HISTCONTROL="ignoreboth" 
set: no such option: history
<_#_qdup_#_> 

Note the no history option warning though.

willr3 commented 9 months ago

qDup was developed and tested against bash so there are probably several places that need to be revisited when connecting to other shells. Is there a zsh equivalent to disable the history for the session? We disable history to avoid polluting the users history with all the commands that run as part of the qDup execution.

zakkak commented 9 months ago

Is there a zsh equivalent to disable the history for the session? We disable history to avoid polluting the users history with all the commands that run as part of the qDup execution.

According to man zshbuiltins it looks like this can be achieved with fc -p:

fc -p pushes the current history list onto a stack and switches to a new history list. If the -a option is also specified, this history list will be automatically popped when the current function scope is exited, which is a much better solution than creating a trap function to call `fc -P' manually. If no arguments are specified, the history list is left empty, $HISTFILE is unset, and $HISTSIZE & $SAVEHIST are set to their default values. If one argument is given, $HISTFILE is set to that file‐name, $HISTSIZE & $SAVEHIST are left unchanged, and the history file is read in (if it exists) to initialize the new list. If a second argument is specified, $HISTSIZE & $SAVEHIST are instead set to the single specified numeric value. Finally, if a third argument is specified, $SAVEHIST is set to a separate value from $HISTSIZE. You are free to change these environment values for the new history list however you desire in order to manipulate the new history list.