dokan-dev / dokany

User mode file system library for windows with FUSE Wrapper
http://dokan-dev.github.io
5.15k stars 657 forks source link

BSOD when handling FSCTL_EVENT_PROCESS_N_PULL #1093

Open LTRData opened 2 years ago

LTRData commented 2 years ago

Feature request can skip this form. Bug report must complete it. Check List must be 100% match or it will be automatically closed without further discussion. Please remove this line.

Environment

Description

Got a dump file from a BSOD from a machine from a person who currently tests an application that I am working on that uses Dokany. She has not been able to replicate the same crash but since the crash dump seems interesting in that I think it might show a possible logic flaw of some kind, I share it here anyway.

The crash happened in DokanProcessAndPullEvents where it tries to deference a NULL RequestContext->Vcb. I see there is code to workaround similar cases in DokanDiskUserFsRequest with the line requestContext.Vcb = requestContext.Dcb->Vcb;, but the problem in this case is that requestContext.Dcb->Vcb is also NULL. The FSCTL code is FSCTL_EVENT_PROCESS_N_PULL and IrpSp->Parameters.FileSystem has OutputBufferLength = 0x20000 and InputBufferLength = 0.

# Logs
2: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff8031fcef021, Address of the instruction which caused the bugcheck
Arg3: ffffa18bc866e9f0, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

Debugging Details:
------------------

KEY_VALUES_STRING: 1

PROCESSES_ANALYSIS: 1

SERVICE_ANALYSIS: 1

STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1

DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  18362.1.amd64fre.19h1_release.190318-1202

SYSTEM_MANUFACTURER:  ASUSTeK COMPUTER INC.

SYSTEM_PRODUCT_NAME:  G75VW

SYSTEM_SKU:  ASUS-NotebookSKU

SYSTEM_VERSION:  1.0       

BIOS_VENDOR:  American Megatrends Inc.

BIOS_VERSION:  G75VW.218

BIOS_DATE:  08/21/2012

BASEBOARD_MANUFACTURER:  ASUSTeK COMPUTER INC.

BASEBOARD_PRODUCT:  G75VW

BASEBOARD_VERSION:  1.0       

DUMP_TYPE:  1

BUGCHECK_P1: c0000005

BUGCHECK_P2: fffff8031fcef021

BUGCHECK_P3: ffffa18bc866e9f0

BUGCHECK_P4: 0

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - Instruktionen p  0x%p refererade till minnet p  0x%p. Minnet kunde inte vara %s.

FAULTING_IP: 
dokan2!DokanProcessAndPullEvents+159 [C:\Project\dokany\sys\fscontrol.c @ 532]
fffff803`1fcef021 c6802402000001  mov     byte ptr [rax+224h],1

CONTEXT:  ffffa18bc866e9f0 -- (.cxr 0xffffa18bc866e9f0)
rax=0000000000000000 rbx=ffffa18bc866f4c0 rcx=00000000c0000023
rdx=0000000000000004 rsi=00000000c0000023 rdi=0000000000000000
rip=fffff8031fcef021 rsp=ffffa18bc866f3e0 rbp=ffffa18bc866f460
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=ffffa18bc866f3b0 r12=0000000000000001 r13=0000000000000000
r14=ffffd30193c7db10 r15=ffffd30193c7db10
iopl=0         nv up ei pl nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
dokan2!DokanProcessAndPullEvents+0x159:
fffff803`1fcef021 c6802402000001  mov     byte ptr [rax+224h],1 ds:002b:00000000`00000224=??
Resetting default scope

BUGCHECK_STR:  0x3B_c0000005

CPU_COUNT: 8

CPU_MHZ: 95b

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3a

CPU_STEPPING: 9

CPU_MICROCODE: 6,3a,9,0 (F,M,S,R)  SIG: 21'00000000 (cache) 21'00000000 (init)

BLACKBOXBSD: 1 (!blackboxbsd)

BLACKBOXNTFS: 1 (!blackboxntfs)

BLACKBOXPNP: 1 (!blackboxpnp)

BLACKBOXWINLOGON: 1

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

PROCESS_NAME:  ArsenalImageMounter.exe

CURRENT_IRQL:  0

ANALYSIS_SESSION_HOST:  VPCWIN8ENT

ANALYSIS_SESSION_TIME:  05-19-2022 14:57:24.0311

ANALYSIS_VERSION: 10.0.18362.1 amd64fre

LAST_CONTROL_TRANSFER:  from fffff8031fcee1ad to fffff8031fcef021

STACK_TEXT:  
ffffa18b`c866f3e0 fffff803`1fcee1ad : 0000007f`fffffff8 ffffa18b`c866f6b0 fffff780`00000014 ffffa18b`c866f6b0 : dokan2!DokanProcessAndPullEvents+0x159 [C:\Project\dokany\sys\fscontrol.c @ 532] 
ffffa18b`c866f470 fffff803`1fcef1e1 : 00000000`00000000 ffffa18b`c866f6b0 ffffd301`95075000 ffffd301`25832a08 : dokan2!DokanDiskUserFsRequest+0x131 [C:\Project\dokany\sys\fscontrol.c @ 572] 
ffffa18b`c866f530 fffff803`1fcee2ec : 00000000`02000103 ffffa18b`c866f6b0 00000000`00000001 00000000`00000000 : dokan2!DokanUserFsRequest+0xf9 [C:\Project\dokany\sys\fscontrol.c @ 601] 
ffffa18b`c866f5c0 fffff803`1fce65a8 : ffffd301`953e9e01 00000000`00000001 00000000`00000001 fffff803`2125ff8b : dokan2!DokanDispatchFileSystemControl+0xe4 [C:\Project\dokany\sys\fscontrol.c @ 981] 
ffffa18b`c866f640 fffff803`1fce5c31 : 00000000`00000000 00000000`00000000 ffffd301`953e9e01 00000000`00000000 : dokan2!DokanDispatchRequest+0x4a4 [C:\Project\dokany\sys\dispatch.c @ 207] 
ffffa18b`c866f7e0 fffff803`212373b9 : ffffd301`93c7db10 ffffd301`953e9e50 ffffd301`953e9e50 00000000`00000000 : dokan2!DokanBuildRequest+0x51 [C:\Project\dokany\sys\dispatch.c @ 38] 
ffffa18b`c866f820 fffff803`217f30e5 : ffffd301`953e9e50 00000000`00000000 00000000`00000000 ffffd301`98d72940 : nt!IofCallDriver+0x59
ffffa18b`c866f860 fffff803`217f2ef0 : 00000000`00000000 ffffa18b`c866fb80 ffffd301`953e9e50 ffffa18b`c866fb80 : nt!IopSynchronousServiceTail+0x1a5
ffffa18b`c866f900 fffff803`218c3cc6 : 00000149`0af3301a 00000000`00000000 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0xc10
ffffa18b`c866fa20 fffff803`213d5d58 : 00000000`00000000 00000000`00000000 00000000`00000000 00000149`03764600 : nt!NtFsControlFile+0x56
ffffa18b`c866fa90 00007ff9`83b5d5c4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
00000016`044bf998 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ff9`83b5d5c4

THREAD_SHA1_HASH_MOD_FUNC:  9aaa22c0bff58e9ccd5382f955354a673869a952

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  7710d48135563bda6864291f35fdf3fbf4767d5a

THREAD_SHA1_HASH_MOD:  3f24b1cef27cbf2545f7c5b929548ba1baf5f1c1

FOLLOWUP_IP: 
dokan2!DokanProcessAndPullEvents+159 [C:\Project\dokany\sys\fscontrol.c @ 532]
fffff803`1fcef021 c6802402000001  mov     byte ptr [rax+224h],1

FAULT_INSTR_CODE:  22480c6

FAULTING_SOURCE_LINE:  C:\Project\dokany\sys\fscontrol.c

FAULTING_SOURCE_FILE:  C:\Project\dokany\sys\fscontrol.c

FAULTING_SOURCE_LINE_NUMBER:  532

FAULTING_SOURCE_CODE:  
   528:     DOKAN_LOG_FINE_IRP(RequestContext, "No output buffer provided");
   529:     return status;
   530:   }
   531:   // 3 - Flag the device as having workers starting to pull events.
>  532:   RequestContext->Vcb->HasEventWait = TRUE;
   533: 
   534:   PEVENT_INFORMATION eventInfo =
   535:       (PEVENT_INFORMATION)(RequestContext->Irp->AssociatedIrp.SystemBuffer);
   536:   ULONG waitTimeoutMs =
   537:       status == STATUS_BUFFER_TOO_SMALL ? 0 : eventInfo->PullEventTimeoutMs;

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  dokan2!DokanProcessAndPullEvents+159

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: dokan2

IMAGE_NAME:  dokan2.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  6209cee2

STACK_COMMAND:  .cxr 0xffffa18bc866e9f0 ; kb

BUCKET_ID_FUNC_OFFSET:  159

FAILURE_BUCKET_ID:  0x3B_c0000005_dokan2!DokanProcessAndPullEvents

BUCKET_ID:  0x3B_c0000005_dokan2!DokanProcessAndPullEvents

PRIMARY_PROBLEM_CLASS:  0x3B_c0000005_dokan2!DokanProcessAndPullEvents

TARGET_TIME:  2022-05-13T17:44:27.000Z

OSBUILD:  18362

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  272

PRODUCT_TYPE:  1

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:  

USER_LCID:  0

OSBUILD_TIMESTAMP:  2036-04-10 22:56:32

BUILDDATESTAMP_STR:  190318-1202

BUILDLAB_STR:  19h1_release

BUILDOSVER_STR:  10.0.18362.1.amd64fre.19h1_release.190318-1202

ANALYSIS_SESSION_ELAPSED_TIME:  28cc

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x3b_c0000005_dokan2!dokanprocessandpullevents

FAILURE_ID_HASH:  {24affc50-8171-b5b4-05d2-8272df37b4ff}

Followup:     MachineOwner
---------
Liryna commented 2 years ago

Thanks @LTRData for the report! The device must have been partially unmounting while the library pulled events. The fix is clearly not enough but might fix it. As said in the commit, the Todo in that function should be implemented and the VCB usage to be removed.

LTRData commented 6 months ago

Got another report now with exact same BSOD (but now with latest driver version). I think I realize now how this could still happen. The fix that calls IsUnmountPendingVcb(RequestContext->Vcb) before the crashing code does not help in this case. In IsUnmountPendingVcb function, it checks whether Vcb is NULL and in that case, it returns FALSE. This means that it will continue beyond this if statement when RequestContext->Vcb is NULL, and then crash at RequestContext->Vcb->HasEventWait = TRUE; because RequestContext->Vcb is NULL.

The question here is what exactly is wrong. Either IsUnmountPendingVcb function needs to be changed to return TRUE when Vcb is NULL, or that if statement needs to be changed to something like if (RequestContext->Vcb == NULL || IsUnmountPendingVcb(RequestContext->Vcb)). Could there be other unexpected side effects with changing the IsUnmountPendingVcb function?

Liryna commented 6 months ago

I would agree to do if (RequestContext->Vcb == NULL || IsUnmountPendingVcb(RequestContext->Vcb)) but I would like to understand why we have a Dcb without Vcb here. We don't have more context when this happens ?

LTRData commented 6 months ago

I would agree to do if (RequestContext->Vcb == NULL || IsUnmountPendingVcb(RequestContext->Vcb)) but I would like to understand why we have a Dcb without Vcb here. We don't have more context when this happens ?

I do not know much about the actual origin of the request that ends up like this. In both cases, I only got minidumps with very limited user process information available. But I am certain that it happens very early in the lifetime of the file system device. Right after mount, I would say.

Liryna commented 6 months ago

That can somewhat make sense. The Vcb is only set to the Dcb after it was mounted. We start to pull when DokanEventStart returns from the kernel. During that call IoVerifyVolume should be forcing the Dcb to get its Vcb through IRP_MN_MOUNT_VOLUME but maybe here it is not happening and we are still wrongly? returning a success to userland which then starts to pull the events on a null Vcb.

What I expect is that https://github.com/dokan-dev/dokany/commit/b74b6484b3e6f872d7a7eeac2b9ecf683b653b71 will stop the crash but will make the mount fail for the user due to pull event failing. Then we will be able to better understand what is going on from the logs.

Does that sound correct ?

LTRData commented 6 months ago

That sounds good! 👍

Liryna commented 6 months ago

Dokan v1 which had event wait to pull the event, also had that check with a comment that was lost https://github.com/dokan-dev/dokany/blob/v1.5.1.1000/sys/event.c#L347-L351