Closed amurzeau closed 5 years ago
Yeah, this is probably the largest bug in SAR that I'm aware of and is the cause of several other reported issues. I think your analysis is correct. The most likely solution is that SarOrphanControlContext should call a new utility function that iterates the queue's pendingIrps list and cancels each of them. Basically would be the same as SarCancelHandleQueueIrp except without the check at https://github.com/eiz/SynchronousAudioRouter/blob/d4d5423c27c169fc5c69267f42acc28d032abcfb/SynchronousAudioRouter/utility.cpp#L286
Yes, I'm trying stuff. It seems the hang does not always occurs, multiprocessor system might be needed.
This is the code I'm trying, to be called when orphaning the context:
void SarCancelAllHandleQueueIrps(SarHandleQueue *handleQueue)
{
KIRQL irql;
LIST_ENTRY pendingIrqsToCancel;
InitializeListHead(&pendingIrqsToCancel);
KeAcquireSpinLock(&handleQueue->lock, &irql);
if (!IsListEmpty(&handleQueue->pendingIrps)) {
PLIST_ENTRY entry = handleQueue->pendingIrps.Flink;
RemoveEntryList(&handleQueue->pendingIrps);
InitializeListHead(&handleQueue->pendingIrps);
AppendTailList(&pendingIrqsToCancel, entry);
}
KeReleaseSpinLock(&handleQueue->lock, irql);
while (!IsListEmpty(&pendingIrqsToCancel)) {
SarHandleQueueIrp *pendingIrp =
CONTAINING_RECORD(pendingIrqsToCancel.Flink, SarHandleQueueIrp, listEntry);
PIRP irp = pendingIrp->irp;
RemoveEntryList(&pendingIrp->listEntry);
ZwClose(pendingIrp->kernelProcessHandle);
ExFreePoolWithTag(pendingIrp, SAR_TAG);
irp->IoStatus.Information = 0;
irp->IoStatus.Status = STATUS_CANCELLED;
IoCompleteRequest(irp, IO_NO_INCREMENT);
}
}
Note: the hang condition is instead to have at least something playing or recording on one of the endpoint, so SarASIO does SAR_WAIT_HANDLE_QUEUE IOCTL. If that IOCTL is not made, there is no pending IRP and thus no hang in case of a process crash or killed.
@eiz in cases you make a release, can you store the PDB files somewhere along with the release ?
So it's possible to have a meaningful call trace in case of a BSOD with SAR using the PDB, windbg and the generated minidump.
Thanks for merging the PR :-)
I've run the VS2017 code analyzer and it found some issues about functions called that require PASSIVE IRQ level while having a mutex lock which had increased the IRQ level. I will try to fix this and make another PR for this.
I didn't had any issues even when having audio running though SAR for a rather long time (> 10h), but I guess it depends on a lot of thing and might cause BSOD or hangs in particular cases, maybe more likely when adding / removing endpoints a lot.
@eiz I, do you plan to release a new version with this fix ? I've other fixes to share but these might cause bugs ... I've only tested my fixes with windows 10 by spamming endpoint creation / removal + using the driver verifier.
So I would rather prefer ones to have a fallback working version just in case other windows' behavior are not the same.
@eiz @amurzeau could one of you create a signed build with those changes? Thanks!
Hi,
I've found that killing a process that is running SAR cause a process hang. The process cannot be killed (even as administrator mode). This is the case when jackd use SAR via ASIO and is stopped with qjackctl (which probably kill jackd). Same when killing VBCABLE_AsioBridge.exe while using SAR and running (the hang doesn't appear if SAR is not running).
Using livekd, I've found the stacktrace to be:
Which means that, while being killed, the process get all its handles closed by Windows. But one of these handle is hung forever. That handle is the file one used to control the SAR driver ("{0eb287d4-6c04-4926-ae19-3c066a4c3f3a}") which has a pending IRP that never completed. That's a IRP_MJ_DEVICE_CONTROL / SAR_WAIT_HANDLE_QUEUE.
(the IRP raw code is 0x22c00c)
By reading the code, I think this is happening:
TLDR: there is a pending SAR_WAIT_HANDLE_QUEUE IRP never completed nor canceled when killing a process that use SAR. That cause a process hang and SAR cannot be used anymore if using ASIO4ALL (the underlying physical device is shown as being locked).
So, I've not tried anything to come up with a patch (trying to understand what was wrong was not easy :) ), but I think that the cleanup IRP should cancel all pending IRP (maybe in the Orphan function), according to https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/irp-mj-cleanup:
Here is a gist of windbg command console with stuff I digged to find that pending IRP: https://gist.github.com/amurzeau/381ad8362b9aeda4436169c364759767