Hyperfoil / qDup

Lab automation and queuing scripting
Apache License 2.0
12 stars 12 forks source link

qDup abort command hangs. #197

Closed whitingjr closed 1 year ago

whitingjr commented 1 year ago

I have a qDup process that has hung.

Nanny idle reports this:


$ curl http://localhost:31337/active
[{
  "uid": 3661,
  "lastUpdate": 1682497907234,
  "name": "abort: Resource diff failed, see above for details",
  "host": "someuser@someserversomewhere:22",
  "contextId": "qdup-cleanup:3598@someuser@someserversomewhere:22",
  "startTime": 1682497907234,
  "runTime": 860286,
  "idleTime": 860286,
  "script": "3598:qdup-cleanup"
}]

the jvm stack dump reveals this

  1.92%    [1/52]V JNI global refs: 28, weak refs: 0
  1.92%    [1/52]\ at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.18/Finalizer.java:170)
  1.92%    [1/52] \ at java.lang.ref.ReferenceQueue.remove(java.base@11.0.18/ReferenceQueue.java:176)
  1.92%    [1/52]  \ - waiting to re-lock in wait() <0x0000000418001478> (a java.lang.ref.ReferenceQueue$Lock)
  1.92%    [1/52]   \ at java.lang.ref.ReferenceQueue.remove(java.base@11.0.18/ReferenceQueue.java:155)
  1.92%    [1/52]    \ - waiting on <0x0000000418001478> (a java.lang.ref.ReferenceQueue$Lock)
  1.92%    [1/52]     V at java.lang.Object.wait(java.base@11.0.18/Native Method)
  1.92%    [1/52]\ at jdk.internal.misc.InnocuousThread.run(java.base@11.0.18/InnocuousThread.java:161)
  1.92%    [1/52] \ at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)
  1.92%    [1/52]  \ at jdk.internal.ref.CleanerImpl.run(java.base@11.0.18/CleanerImpl.java:148)
  1.92%    [1/52]   \ - waiting to re-lock in wait() <0x0000000418002250> (a java.lang.ref.ReferenceQueue$Lock)
  1.92%    [1/52]    \ at java.lang.ref.ReferenceQueue.remove(java.base@11.0.18/ReferenceQueue.java:155)
  1.92%    [1/52]     \ - waiting on <no object reference available>
  1.92%    [1/52]      V at java.lang.Object.wait(java.base@11.0.18/Native Method)
  1.92%    [1/52]\ at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.18/ForkJoinWorkerThread.java:183)
  1.92%    [1/52] \ at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.18/ForkJoinPool.java:1628)
  1.92%    [1/52]  \ at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
  1.92%    [1/52]   \ - parking to wait for  <0x0000000418960348> (a java.util.concurrent.ForkJoinPool)
  1.92%    [1/52]    V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
  1.92%    [1/52]\ at io.hyperfoil.tools.qdup.QDup.main(QDup.java:444)
  1.92%    [1/52] \ at io.hyperfoil.tools.qdup.QDup.run(QDup.java:683)
  1.92%    [1/52]  \ at io.hyperfoil.tools.qdup.Run.run(Run.java:439)
  1.92%    [1/52]   \ at java.util.concurrent.CountDownLatch.await(java.base@11.0.18/CountDownLatch.java:232)
  1.92%    [1/52]    \ at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.18/AbstractQueuedSynchronizer.java:1345)
  1.92%    [1/52]     \ at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.18/AbstractQueuedSynchronizer.java:1039)
  1.92%    [1/52]      \ at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.18/AbstractQueuedSynchronizer.java:885)
  1.92%    [1/52]       \ at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
  1.92%    [1/52]        \ - parking to wait for  <0x000000041895e810> (a java.util.concurrent.CountDownLatch$Sync)
  1.92%    [1/52]         V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
  3.85%    [2/52]V No compile task
  1.92%    [1/52]\ at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.18/Reference.java:213)
  1.92%    [1/52] \ at java.lang.ref.Reference.processPendingReferences(java.base@11.0.18/Reference.java:241)
  1.92%    [1/52]  V at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.18/Native Method)
 82.69%   [43/52]X at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)
  1.92%    [1/52]|\ at sun.nio.ch.EPollPort$EventHandlerTask.run(java.base@11.0.18/EPollPort.java:281)
  1.92%    [1/52]| \ at sun.nio.ch.EPollPort$EventHandlerTask.poll(java.base@11.0.18/EPollPort.java:200)
  1.92%    [1/52]|  V at sun.nio.ch.EPoll.wait(java.base@11.0.18/Native Method)
  5.77%    [3/52]|\ at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  5.77%    [3/52]| \ at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
  5.77%    [3/52]|  \ at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
  5.77%    [3/52]|   \ at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
  5.77%    [3/52]|    \ at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:824)
  5.77%    [3/52]|     \ at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
  5.77%    [3/52]|      X at sun.nio.ch.SelectorImpl.select(java.base@11.0.18/SelectorImpl.java:136)
  1.92%    [1/52]|      |\ - locked <0x0000000418961a70> (a sun.nio.ch.EPollSelectorImpl)
  1.92%    [1/52]|      | \ - locked <0x0000000418961ac8> (a io.netty.channel.nio.SelectedSelectionKeySet)
  1.92%    [1/52]|      |  \ at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.18/SelectorImpl.java:124)
  1.92%    [1/52]|      |   \ at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.18/EPollSelectorImpl.java:120)
  1.92%    [1/52]|      |    V at sun.nio.ch.EPoll.wait(java.base@11.0.18/Native Method)
  1.92%    [1/52]|      |\ - locked <0x000000041a77acf0> (a sun.nio.ch.EPollSelectorImpl)
  1.92%    [1/52]|      | \ - locked <0x000000041a77acd8> (a io.netty.channel.nio.SelectedSelectionKeySet)
  1.92%    [1/52]|      |  \ at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.18/SelectorImpl.java:124)
  1.92%    [1/52]|      |   \ at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.18/EPollSelectorImpl.java:120)
  1.92%    [1/52]|      |    V at sun.nio.ch.EPoll.wait(java.base@11.0.18/Native Method)
  1.92%    [1/52]|       \ - locked <0x000000041895fbd0> (a sun.nio.ch.EPollSelectorImpl)
  1.92%    [1/52]|        \ - locked <0x000000041895fc28> (a io.netty.channel.nio.SelectedSelectionKeySet)
  1.92%    [1/52]|         \ at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.18/SelectorImpl.java:124)
  1.92%    [1/52]|          \ at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.18/EPollSelectorImpl.java:120)
  1.92%    [1/52]|           V at sun.nio.ch.EPoll.wait(java.base@11.0.18/Native Method)
 75.00%   [39/52] \ at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18/ThreadPoolExecutor.java:628)
 75.00%   [39/52]  \ at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18/ThreadPoolExecutor.java:1114)
 75.00%   [39/52]   X at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.18/ThreadPoolExecutor.java:1054)
 50.00%   [26/52]   |X at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.18/ScheduledThreadPoolExecutor.java:899)
  5.77%    [3/52]   ||\ at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.18/ScheduledThreadPoolExecutor.java:1182)
  5.77%    [3/52]   || \ at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.18/AbstractQueuedSynchronizer.java:2123)
  5.77%    [3/52]   ||  X at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.18/LockSupport.java:234)
  1.92%    [1/52]   ||  |\ - parking to wait for  <0x0000000418961450> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  1.92%    [1/52]   ||  | V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
  1.92%    [1/52]   ||  |\ - parking to wait for  <0x00000004603bd258> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  1.92%    [1/52]   ||  | V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
  1.92%    [1/52]   ||   \ - parking to wait for  <0x00000004603bb540> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  1.92%    [1/52]   ||    V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
 44.23%   [23/52]   | \ at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.18/ScheduledThreadPoolExecutor.java:1177)
 44.23%   [23/52]   |  \ at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.18/AbstractQueuedSynchronizer.java:2081)
 44.23%   [23/52]   |   \ at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
 44.23%   [23/52]   |    \ - parking to wait for  <0x0000000418961450> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 44.23%   [23/52]   |     V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
 25.00%   [13/52]    \ at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.18/LinkedBlockingQueue.java:433)
 25.00%   [13/52]     \ at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.18/AbstractQueuedSynchronizer.java:2081)
 25.00%   [13/52]      X at java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
  1.92%    [1/52]      |\ - parking to wait for  <0x00000004603bcd08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  1.92%    [1/52]      | V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
 23.08%   [12/52]       \ - parking to wait for  <0x00000004189610d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 23.08%   [12/52]        V at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
  1.92%    [1/52]\ at java.util.TimerThread.run(java.base@11.0.18/Timer.java:506)
  1.92%    [1/52] \ - waiting to re-lock in wait() <0x000000041895f1b8> (a java.util.TaskQueue)
  1.92%    [1/52]  \ at java.util.TimerThread.mainLoop(java.base@11.0.18/Timer.java:553)
  1.92%    [1/52]   \ - waiting on <no object reference available>
  1.92%    [1/52]    V at java.lang.Object.wait(java.base@11.0.18/Native Method)

qdup version: 0.6.16 jvm running an uber jar

whitingjr commented 1 year ago

My scenario seems to be the abort command is called during run and then also cleanup script.

whitingjr commented 1 year ago

the actual line number is 534 that is waiting on the latch

willr3 commented 1 year ago

Yep. apparently an abort after an abort has already caused the run to end does not properly pass execution to the next command and so the run hangs. I'll start working on a solution

FYI: That latch prevents the main thread from ending before the qDup run finishes. It prevents the java -jar .. from ending while qDup is still running

willr3 commented 1 year ago

fixed in 9570c9f0b2e7fcc2d49b033113a36ff11c7b8fd7