eiz / SynchronousAudioRouter

Low latency application audio routing for Windows
http://sar.audio/
GNU General Public License v3.0
1.04k stars 138 forks source link

Process using SAR get hung if killed using task manager #59

Closed amurzeau closed 5 years ago

amurzeau commented 5 years ago

Hi,

I've found that killing a process that is running SAR cause a process hang. The process cannot be killed (even as administrator mode). This is the case when jackd use SAR via ASIO and is stopped with qjackctl (which probably kill jackd). Same when killing VBCABLE_AsioBridge.exe while using SAR and running (the hang doesn't appear if SAR is not running).

Using livekd, I've found the stacktrace to be:

nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeDelayExecutionThread+0xe14
nt! ?? ::FNODOBFM::`string'+0x33718
nt!IopCleanupProcessResources+0x25
nt!IopCloseFile+0x272
nt!ObpDecrementHandleCount+0x1b6
nt!ObCloseHandleTableEntry+0x313
nt!ExSweepHandleTable+0xba
nt!ObKillProcess+0x31
nt!PspRundownSingleProcess+0xa4
nt!PspExitThread+0x573
nt!KiSchedulerApcTerminate+0x18
nt!KiDeliverApc+0x2fa
nt!KiInitiateUserApc+0x70
nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffd001`85f8db00)

Which means that, while being killed, the process get all its handles closed by Windows. But one of these handle is hung forever. That handle is the file one used to control the SAR driver ("{0eb287d4-6c04-4926-ae19-3c066a4c3f3a}") which has a pending IRP that never completed. That's a IRP_MJ_DEVICE_CONTROL / SAR_WAIT_HANDLE_QUEUE.

0: kd> !irp 0xffffe000`28108d30 1
Irp is active with 7 stacks 6 is current (= 0xffffe00028108f68)
 No Mdl: System buffer=ffffe000270f3040: Thread ffffe0002b976480:  Irp stack trace.  
Flags = 00062070
ThreadListEntry.Flink = ffffe0002b0b8d70
ThreadListEntry.Blink = ffffe0002b0b8d70
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000001
Cancel = 01
CancelIrql = 0
ApcEnvironment = 00
UserIosb = 00ae6150
UserEvent = 00000000
Overlay.AsynchronousParameters.UserApcRoutine = ffffe0002b050682
Overlay.AsynchronousParameters.UserApcContext = 00ae6150
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000   
UserBuffer = 00ae6170
&Tail.Overlay.DeviceQueueEntry = ffffe00028108da8
Tail.Overlay.Thread = ffffe0002b976480
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = 00000000
Tail.Overlay.ListEntry.Blink = 00000000
Tail.Overlay.CurrentStackLocation = ffffe00028108f68
Tail.Overlay.OriginalFileObject = ffffe0002b0b8cb0
Tail.Apc = 00000000
Tail.CompletionKey = 00000000
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

            Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_DEVICE_CONTROL(e), N/A(0)]
            5  1 ffffe0002a778040 ffffe0002b0b8cb0 00000000-00000000    pending
           \Driver\SynchronousAudioRouter
            Args: 00000200 00000000 0x22c00c 00000000
 [IRP_MJ_DEVICE_CONTROL(e), N/A(0)]
            5  0 ffffe0002a7783a0 ffffe0002b0b8cb0 00000000-00000000    
           \Driver\ksthunk

(the IRP raw code is 0x22c00c)

By reading the code, I think this is happening:


TLDR: there is a pending SAR_WAIT_HANDLE_QUEUE IRP never completed nor canceled when killing a process that use SAR. That cause a process hang and SAR cannot be used anymore if using ASIO4ALL (the underlying physical device is shown as being locked).

So, I've not tried anything to come up with a patch (trying to understand what was wrong was not easy :) ), but I think that the cleanup IRP should cancel all pending IRP (maybe in the Orphan function), according to https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/irp-mj-cleanup:

If the driver's device objects were set up as exclusive, so that only a single thread can use the device at a time, the driver must complete every IRP that is currently queued to the target device object and set STATUS_CANCELLED in each IRP's I/O status block.

Otherwise, the driver must cancel and complete only the currently queued IRPs that are associated with the file object handle that is being released.

Here is a gist of windbg command console with stuff I digged to find that pending IRP: https://gist.github.com/amurzeau/381ad8362b9aeda4436169c364759767

eiz commented 5 years ago

Yeah, this is probably the largest bug in SAR that I'm aware of and is the cause of several other reported issues. I think your analysis is correct. The most likely solution is that SarOrphanControlContext should call a new utility function that iterates the queue's pendingIrps list and cancels each of them. Basically would be the same as SarCancelHandleQueueIrp except without the check at https://github.com/eiz/SynchronousAudioRouter/blob/d4d5423c27c169fc5c69267f42acc28d032abcfb/SynchronousAudioRouter/utility.cpp#L286

amurzeau commented 5 years ago

Yes, I'm trying stuff. It seems the hang does not always occurs, multiprocessor system might be needed.

This is the code I'm trying, to be called when orphaning the context:


void SarCancelAllHandleQueueIrps(SarHandleQueue *handleQueue)
{
    KIRQL irql;
    LIST_ENTRY pendingIrqsToCancel;

    InitializeListHead(&pendingIrqsToCancel);

    KeAcquireSpinLock(&handleQueue->lock, &irql);

    if (!IsListEmpty(&handleQueue->pendingIrps)) {
        PLIST_ENTRY entry = handleQueue->pendingIrps.Flink;

        RemoveEntryList(&handleQueue->pendingIrps);
        InitializeListHead(&handleQueue->pendingIrps);
        AppendTailList(&pendingIrqsToCancel, entry);
    }

    KeReleaseSpinLock(&handleQueue->lock, irql);

    while (!IsListEmpty(&pendingIrqsToCancel)) {
        SarHandleQueueIrp *pendingIrp =
            CONTAINING_RECORD(pendingIrqsToCancel.Flink, SarHandleQueueIrp, listEntry);
        PIRP irp = pendingIrp->irp;

        RemoveEntryList(&pendingIrp->listEntry);

        ZwClose(pendingIrp->kernelProcessHandle);
        ExFreePoolWithTag(pendingIrp, SAR_TAG);

        irp->IoStatus.Information = 0;
        irp->IoStatus.Status = STATUS_CANCELLED;
        IoCompleteRequest(irp, IO_NO_INCREMENT);
    }
}
amurzeau commented 5 years ago

Note: the hang condition is instead to have at least something playing or recording on one of the endpoint, so SarASIO does SAR_WAIT_HANDLE_QUEUE IOCTL. If that IOCTL is not made, there is no pending IRP and thus no hang in case of a process crash or killed.

amurzeau commented 5 years ago

@eiz in cases you make a release, can you store the PDB files somewhere along with the release ?

So it's possible to have a meaningful call trace in case of a BSOD with SAR using the PDB, windbg and the generated minidump.

amurzeau commented 5 years ago

Thanks for merging the PR :-)

I've run the VS2017 code analyzer and it found some issues about functions called that require PASSIVE IRQ level while having a mutex lock which had increased the IRQ level. I will try to fix this and make another PR for this.

I didn't had any issues even when having audio running though SAR for a rather long time (> 10h), but I guess it depends on a lot of thing and might cause BSOD or hangs in particular cases, maybe more likely when adding / removing endpoints a lot.

amurzeau commented 5 years ago

@eiz I, do you plan to release a new version with this fix ? I've other fixes to share but these might cause bugs ... I've only tested my fixes with windows 10 by spamming endpoint creation / removal + using the driver verifier.

So I would rather prefer ones to have a fallback working version just in case other windows' behavior are not the same.

pannal commented 5 years ago

@eiz @amurzeau could one of you create a signed build with those changes? Thanks!