Closed don-Pardon closed 9 years ago
Hi ! I already had such BSOD, the device is not unmount and nobody controle it so the system become unstable until the os crash.
I think the service should force unmount when killed/restart and also contact mirrorfs to forward that the device is killed.
We should even probably add a ping system from the service to mirrorfs in case that something goes wrong for one of them.
It is only suggestion because I do not know if the driver have such security already but it seems not.
(I hope that I have properly understand your ticket)
Yep, you get it right. I've also thought about force unmount when service is being restarted, it would solve the problem. Moreover, is there any use-case for DokanMount service being restarted and mounted FS being run? On the other hand service restart doesn't affect mounted FS, it is still mounted and is working properly, so why turn it off?..
I think we should see the service as a guard. If someone ask him to restart or stop he should clean everything before leaving.
On the other hand service restart doesn't affect mounted FS, it is still mounted and is working properly, so why turn it off?..
I have never tested such case :smiley: but as there will be nobody to clean the device...we will get a BSOD. Better to give the possibility to the mirror for handling it that break the OS.
For me if the service restart, it mean that something goes wrong or someone that have forget to clean his running "mirror" properly and none of this reason are normal use-case.
Alright, I agree, cleaning is a preferable solution. The other problem is service being killed or when service simply crashes. The ping system sounds good (and I didn't found anything similar in dokan yet), but we need to take into account that FS can perform some heavy operations - like downloading some chunk of data for ReadFile request, so FS will be mute for ping requests for some time.
I have never tested such case
You can try =) but don't forget to do dokan_control.exe /u Z: /f
when you finished =)
Have you already use DOKAN_OPTION_KEEP_ALIVE ? it say that it is for auto unmount. When enabled it perform IOCTL_KEEPALIVE to the kernel driver for updating a internal timer. If the timer is reach the device is unmount. https://github.com/dokan-dev/dokany/blob/22cf77c32594ec74e12038e9bc470e57524b8e9e/sys/timeout.c#L80 If this really work, killing the service with the option should work. Can you test with this option ?
I've been testing with that option turned on. The thing is - when auto unmount is triggered, DokanService searches for mount entry and if service was restarted, appropriate record won't be found and DokanControlUnmount
won't be called. It would be called if DOKAN_CONTROL_OPTION_FORCE_UNMOUNT
option was specified (like in dokan_control /u Z: /f
call).
So, another thought comes up - if KEEP_ALIVE was specified, use FORCE_UNMOUNT option. But I'm not sure that it wouldn't lead to other issues and moreover "KEEP_ALIVE" and "FORCE_UNMOUNT" are kind of opposite ideas.
@don-Pardon Oh ok I see ><
Looks like @marinkobabic fixes resolved this issue. I cannot reproduce your BSOD with these changes whereas I was with previous versions. Could you try this pre-release and let me know if you still have BSOD? https://github.com/dokan-dev/dokany/releases/tag/0.7.3-RC2 Thanks.
@don-Pardon could you give a try with https://github.com/dokan-dev/dokany/releases/tag/0.7.3-RC3? Thanks.
don-Pardon's results notwithstanding I can report that dokany 0.7.3-RC3 greatly reduces BSODs on my machine. I still get crashes whenever I leave a mount point open after aborting a DokanNet.Dokan instance in the debugger and try to mount for a second time - but that's fair enough. With 0.7.3-RC2 the BSOD appeared right after aborting the debug session.
@Maxhy There are few changes which you should merge as well.
@viciousviper If you could provide more details using WinDbg and the command !analyze -v when you have opened your crash dump, that would be great.
There is no reason for a BSOD. We must identify the problem and solve it.
Big thank you @marinkobabic ! Your contributions are always welcomed ! I made a Pre-release with your changes.
@viciousviper could you test with this version ? and make a report using WinDbg as marinkobabic explained ? https://github.com/dokan-dev/dokany/releases/tag/v0.7.3-RC4
There you go. This happened last night with v0.7.3-RC4 while debugging a Dokan.Net-application in VS2015RC:
-- removed misleading .dmp from devenv.exe
Maybe I am missing something but the crash report is from devenv.exe
PROCESS_NAME: devenv.exe
Are you debugging dokan with VS when you run it ? If yes, could you run dokan without VS and make a new crash report ?
Well, yes, just as I wrote above. So far I've only witnessed BSODs after I aborted a VS debugging session on my still very incomplete Dokan.Net application. I'll see if I can make my machine crash without the help of VS :-)
Oh sorry! I missed this information :smile: haha
So for now, you have never been able to make a crash without VS ? For me, the crash with VS is much more related to the current VS 2015 RC stability.
You certainly have a point there. I'll probably get around to upgrading my VS to 2015 final in the next couple of days. However, while I did see my share of exceptions and crashes inside VS 2015 RC I've never had regular BSODs until I started to fiddle with dokan. No offense intended - I'm just trying to help nail down the BSOD source.
In the meantime, how's this:
Microsoft (R) Windows Debugger Version 6.3.9600.17336 AMD64 Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP] Kernel Bitmap Dump File: Only kernel address space is available
***** Symbol Path validation summary ** Response Time (ms) Location Deferred SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols Symbol search path is: SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols Executable search path is: Windows 8 Kernel Version 9600 MP (8 procs) Free x64 Product: WinNt, suite: TerminalServer SingleUserTS Built by: 9600.17736.amd64fre.winblue_r9.150322-1500 Machine Name: Kernel base = 0xfffff803
7b210000 PsLoadedModuleList = 0xfffff803
7b4e9850 Debug session time: Thu Jul 30 22:35:28.516 2015 (UTC + 2:00) System Uptime: 0 days 0:39:30.251 Loading Kernel Symbols ............................................................... ................................................................ ................................................................ ............. Loading User SymbolsLoading unloaded module list ............
* *
- Bugcheck Analysis *
- *
Use !analyze -v to get detailed debugging information.
BugCheck 7E, {ffffffffc0000005, fffff8037b2c896b, ffffd001687af758, ffffd001687aef60}
*\ ERROR: Module load completed but symbols could not be loaded for dokan.sys Probably caused by : dokan.sys ( dokan+2523 )
Followup: MachineOwner
2: kd> !analyze -v
* *
- Bugcheck Analysis *
- *
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: ffffffffc0000005, The exception code that was not handled Arg2: fffff8037b2c896b, The address that the exception occurred at Arg3: ffffd001687af758, Exception Record Address Arg4: ffffd001687aef60, Context Record Address
Debugging Details:
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08l referenced memory at 0x%08lx. The memory could not be "%s".
FAULTING_IP: nt!IopfCompleteRequest+c1b fffff803`7b2c896b 488b4018 mov rax,qword ptr [rax+18h]
EXCEPTION_RECORD: ffffd001687af758 -- (.exr 0xffffd001687af758) ExceptionAddress: fffff8037b2c896b (nt!IopfCompleteRequest+0x0000000000000c1b) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 0000000000000000 Parameter[1]: 0000000000000019 Attempt to read from address 0000000000000019
CONTEXT: ffffd001687aef60 -- (.cxr 0xffffd001687aef60;r) rax=0000000000000001 rbx=ffffe00107d5d910 rcx=0000000000000884 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001 rip=fffff8037b2c896b rsp=ffffd001687af990 rbp=ffffd001687afa90 r8=0000000000000001 r9=ffffe00107cb2650 r10=0000000000000000 r11=ffffd001687afac8 r12=00000000a000000c r13=0000000000000000 r14=ffffe00108a19200 r15=00000000a0000003 iopl=0 nv up ei pl nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202 nt!IopfCompleteRequest+0xc1b: fffff803
7b2c896b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:00000000
00000019=???????????????? Last set context: rax=0000000000000001 rbx=ffffe00107d5d910 rcx=0000000000000884 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001 rip=fffff8037b2c896b rsp=ffffd001687af990 rbp=ffffd001687afa90 r8=0000000000000001 r9=ffffe00107cb2650 r10=0000000000000000 r11=ffffd001687afac8 r12=00000000a000000c r13=0000000000000000 r14=ffffe00108a19200 r15=00000000a0000003 iopl=0 nv up ei pl nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202 nt!IopfCompleteRequest+0xc1b: fffff8037b2c896b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:00000000
00000019=???????????????? Resetting default scopePROCESS_NAME: System
CURRENT_IRQL: 0
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08l referenced memory at 0x%08lx. The memory could not be "%s".
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000000000000019
READ_ADDRESS: unable to get nt!MmNonPagedPoolStart unable to get nt!MmSizeOfNonPagedPoolInBytes 0000000000000019
FOLLOWUP_IP: dokan+2523 fffff800`de02e523 4883c428 add rsp,28h
BUGCHECK_STR: AV
DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
LAST_CONTROL_TRANSFER: from fffff800de02e523 to fffff8037b2c896b
STACK_TEXT:
ffffd001687af990 fffff800
de02e523 : ffffe0011026f080 ffffe001
08a19200 ffffe00108a19200 ffffd001
00000000 : nt!IopfCompleteRequest+0xc1b ffffd001687afad0 fffff800
de031f7c : 0000000000025090 ffffe001
08a19260 fffffff600000002 fffff800
00000006 : dokan+0x2523 ffffd001687afb00 fffff800
de031c3d : ffffe00108a191d0 00000000
00000000 ffffe00107177501 00007ff8
00000000 : dokan+0x5f7c ffffd001687afb60 fffff803
7b31036c : 0000000000000000 ffffe001
0b985880 ffffe0010b985880 fffff901
40819b10 : dokan+0x5c3d ffffd001687afc00 fffff803
7b3672c6 : ffffd001657e4180 ffffe001
0b985880 ffffd001657f03c0 00000000
00000000 : nt!PspSystemThreadStartup+0x58 ffffd001687afc60 00000000
00000000 : ffffd001687b0000 ffffd001
687aa000 0000000000000000 00000000
00000000 : nt!KiStartSystemThread+0x16SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: dokan+2523
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: dokan
IMAGE_NAME: dokan.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 55b8993b
STACK_COMMAND: .cxr 0xffffd001687aef60 ; kb
FAILURE_BUCKET_ID: AV_dokan+2523
BUCKET_ID: AV_dokan+2523
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:av_dokan+2523
FAILURE_ID_HASH: {9d91f95c-aa94-4f23-2d77-0f802f2a29b4}
Followup: MachineOwner
The last crash is interesting. You should get the actual symbol file and then execute the analyze command. So we would get a clear stack trace.
I'd be happy to help if someone (Maxhy?) could provide me with the .pdb for dokan.sys 0.7.3-RC4.
Sorry @viciousviper, The pdb files have been erased by a new build :cry: . Could you install this version of 0.7.3-RC4 and reproduct the crash ? http://download.islog.com/dokan/
The sys pdf files of this build are in x64.rar.
For the next releases, I will add the pdb files next to the installer in the download page.
Ok, updated to your special build. Now I'll have to see if I can crash nicely again #-)
Another one bites the dust ...
Microsoft (R) Windows Debugger Version 6.3.9600.17336 AMD64 Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Windows\MEMORY.DMP] Kernel Bitmap Dump File: Only kernel address space is available
***** Symbol Path validation summary ** Response Time (ms) Location OK D:\Temp\Source Deferred SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols Symbol search path is: D:\Temp\Source;SRV_C:\Windows\symbol_cache_http://msdl.microsoft.com/download/symbols Executable search path is: Windows 8 Kernel Version 9600 MP (8 procs) Free x64 Product: WinNt, suite: TerminalServer SingleUserTS Built by: 9600.17736.amd64fre.winblue_r9.150322-1500 Machine Name: Kernel base = 0xfffff800
82e19000 PsLoadedModuleList = 0xfffff800
830f2850 Debug session time: Wed Aug 5 00:34:07.875 2015 (UTC + 2:00) System Uptime: 0 days 0:23:19.610 Loading Kernel Symbols ............................................................... ................................................................ ................................................................ .............. Loading User SymbolsLoading unloaded module list .............
* Bugcheck Analysis
Use !analyze -v to get detailed debugging information.
BugCheck 7E, {ffffffffc0000005, fffff80082ed196b, ffffd00024215758, ffffd00024214f60}
Probably caused by : dokan.sys ( dokan!DokanCompleteIrpRequest+2b )
Followup: MachineOwner
2: kd> !analyze -v
* Bugcheck Analysis
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: ffffffffc0000005, The exception code that was not handled Arg2: fffff80082ed196b, The address that the exception occurred at Arg3: ffffd00024215758, Exception Record Address Arg4: ffffd00024214f60, Context Record Address
Debugging Details:
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be "%s".
FAULTING_IP: nt!IopfCompleteRequest+c1b fffff800`82ed196b 488b4018 mov rax,qword ptr [rax+18h]
EXCEPTION_RECORD: ffffd00024215758 -- (.exr 0xffffd00024215758) ExceptionAddress: fffff80082ed196b (nt!IopfCompleteRequest+0x0000000000000c1b) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 0000000000000000 Parameter[1]: 0000000000000019 Attempt to read from address 0000000000000019
CONTEXT: ffffd00024214f60 -- (.cxr 0xffffd00024214f60;r) rax=0000000000000001 rbx=ffffe00194206ee0 rcx=0000000000000884 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001 rip=fffff80082ed196b rsp=ffffd00024215990 rbp=ffffd00024215a90 r8=0000000000000001 r9=ffffe001943dc820 r10=0000000000000000 r11=ffffd00024215ac8 r12=00000000a000000c r13=0000000000000000 r14=ffffe0019254f700 r15=00000000a0000003 iopl=0 nv up ei pl nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202 nt!IopfCompleteRequest+0xc1b: fffff800
82ed196b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:00000000
00000019=???????????????? Last set context: rax=0000000000000001 rbx=ffffe00194206ee0 rcx=0000000000000884 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000001 rip=fffff80082ed196b rsp=ffffd00024215990 rbp=ffffd00024215a90 r8=0000000000000001 r9=ffffe001943dc820 r10=0000000000000000 r11=ffffd00024215ac8 r12=00000000a000000c r13=0000000000000000 r14=ffffe0019254f700 r15=00000000a0000003 iopl=0 nv up ei pl nz na pe nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010202 nt!IopfCompleteRequest+0xc1b: fffff80082ed196b 488b4018 mov rax,qword ptr [rax+18h] ds:002b:00000000
00000019=???????????????? Resetting default scopePROCESS_NAME: System
CURRENT_IRQL: 0
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be "%s".
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 0000000000000019
READ_ADDRESS: unable to get nt!MmNonPagedPoolStart unable to get nt!MmSizeOfNonPagedPoolInBytes 0000000000000019
FOLLOWUP_IP: dokan!DokanCompleteIrpRequest+2b [d:\islog\dev\app\tmp\dokany\sys\dokan.c @ 487] fffff801`47878523 4883c428 add rsp,28h
BUGCHECK_STR: AV
DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
LAST_CONTROL_TRANSFER: from fffff80147878523 to fffff80082ed196b
STACK_TEXT:
ffffd00024215990 fffff801
47878523 : 0000000000000000 fffff800
82edb800 0000000000000000 00000000
00000000 : nt!IopfCompleteRequest+0xc1b ffffd00024215ad0 fffff801
4787bf7c : 0000000000015de7 ffffe001
9254f7f0 0000000000000000 fffff801
4787bba8 : dokan!DokanCompleteIrpRequest+0x2b [d:\islog\dev\app\tmp\dokany\sys\dokan.c @ 487] ffffd00024215b00 fffff801
4787bc3d : ffffe0019254f760 00000000
00000000 ffffe00188196001 00007ff8
00000000 : dokan!ReleaseTimeoutPendingIrp+0x1b0 [d:\islog\dev\app\tmp\dokany\sys\timeout.c @ 202] ffffd00024215b60 fffff800
82f1936c : 0000000000000000 ffffe001
94c93300 ffffe00194c93300 fffff901
412216b0 : dokan!DokanTimeoutThread+0x95 [d:\islog\dev\app\tmp\dokany\sys\timeout.c @ 300] ffffd00024215c00 fffff800
82f702c6 : ffffd0003d7ea180 ffffe001
94c93300 ffffd0003d7f63c0 00000000
00000000 : nt!PspSystemThreadStartup+0x58 ffffd00024215c60 00000000
00000000 : ffffd00024216000 ffffd000
24210000 0000000000000000 00000000
00000000 : nt!KiStartSystemThread+0x16FAULTING_SOURCE_LINE: d:\islog\dev\app\tmp\dokany\sys\dokan.c
FAULTING_SOURCE_FILE: d:\islog\dev\app\tmp\dokany\sys\dokan.c
FAULTING_SOURCE_LINE_NUMBER: 487
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: dokan!DokanCompleteIrpRequest+2b
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: dokan
IMAGE_NAME: dokan.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 55bb26f2
STACK_COMMAND: .cxr 0xffffd00024214f60 ; kb
BUCKET_ID_FUNC_OFFSET: 2b
FAILURE_BUCKET_ID: AV_dokan!DokanCompleteIrpRequest
BUCKET_ID: AV_dokan!DokanCompleteIrpRequest
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:av_dokan!dokancompleteirprequest
FAILURE_ID_HASH: {a0df1a10-978f-2e6d-4f52-b253bcbf29f6}
Followup: MachineOwner
Thats a good report! Do you know exactly how to reproduct it ?
It seems that there is corruption of one IRP in the pending irp list causing the crash. All PendingIrp are protected with a KeAcquireSpinLock but at this part of the code a new list with complete pendings irp is create and unprotected.
@marinkobabic I would like your advice about it since you seems to know more than me :yum: Should we move the KeReleaseSpinLock right after the while ? (L208) https://github.com/marinkobabic/dokanx/blob/Windows8DeleteIssue/sys/timeout.c#L199 This could protect the IRP pointer used by IoCompleteRequest.
The documentation about IoCompleteRequest say: "Never call IoCompleteRequest while holding a spin lock. Attempting to complete an IRP while holding a spin lock can cause deadlocks." Are they talking about every spin lock or the spin lock of the IRP ?
EDIT: I just found that RemoveTailList is never used :O does that mean the PendingIrp is really never cleaned from irp completed ? if we clean it, the source of this BSOD will be removed.
No, I cannot reliably reproduce the crash. What I can say is that the BSOD appears several seconds after I terminate the thread that my .NET application gets called on via Dokan.NET.
Usually the method being called is one of CreateFile()
, GetFileInformation()
or GetVolumeInformation()
in DokanNet.Dokan
.
Prior to the BSOD Windows Explorer starts to lag when I point it to the filesystem root ("This PC"), probably due to a leftover "disconnected network drive" at my mount point. My application then repeatedly runs into the case DOKAN_MOUNT_ERROR: throw new DokanException(status, "Can't assign a drive letter or mount point"
error when trying to unmount and re-mount the Dokan drive and may or may not succeed eventually.
As you may have concluded from my memory dump m< dev environment is running on Windows 8.1 Pro x64 in a physical box - which I'll probably move to Hyper-V unless the BSODs disappear.
And finally, I'm still on VS 2015 RC/.NET 4.6 RC which could also be a factor - although I've never had a BSOD with this setup outside of my Dokan experiments.
@Liryna Just few information for you, to make some details clear so that we can together investigate the issue. Inside of the method DokanCompleteIrp you have the following line https://github.com/dokan-dev/dokany/blob/master/sys/event.c#L319 so the entry is removed. To your previous question the completeList is inside of the local scope of the method. So it doesn’t help to protect this list.
@viciousviper Is there a way to get the memory.dmp from you?
@marinkobabic
Do you need the full MEMORY.dmp
(1.3Gb raw, 170Mb as a .7z) or will the associated minidump be sufficient (330kb raw, 140kb as a .7z)? I could easily mail you the minidump while the full dump would require uploading to a filehoster.
Full would be great if possible :-)
@marinkobabic
Please let me know your email address in a mail to j_h at mail.org
so I can provide you with a download location for the dump.
Does the analysis of the memory dump gave more informations ?
I think I also got this crash. Is there any quick change to Dokan that I can make to make this cause an error rather than a BSOD?
http://www.voltagex.org/081615-16812-01.dmp but I may need to move this file. No full dump captured by the look of it.
I caused this BSOD by running the project in http://voltagex.org/DokanTest.7z a couple of times.
@voltagex Thank you for the dump. Unfortunatly, the crash happen in ntoskrnl.exe. WinDbg can tell you from which software it come from. https://github.com/dokan-dev/dokany/wiki/How-to-Debug-Dokan#crash-report-bsod In case you succeed to get the crash in dokan, I would be glad to look at the report.
I have try to open DokanTest.7z but SevenZippedFile.cs is full of '\0'.
Looks like the crash corrupted some files.
I'm away from my computer today, but I'll get back to you tomorrow.
Download DokanTest.7z again, I've fixed that file. I don't think the current version will crash.
Try this: mount a drive as Z:, mount it again (fails), unmount Z: and mount again (crash)
But @voltagex ... you have implemented nothing :neutral_face:. Please provide a crash report of dokan when you will have finished your implementation.
Even with only a few things implemented, Dokan shouldn't cause a BSOD, right? On 18 Aug 2015 5:27 am, "Liryna" notifications@github.com wrote:
But @voltagex https://github.com/voltagex ... you have implemented nothing [image: :neutral_face:]. Please provide a crash report of dokan when you will have finished your implementation.
— Reply to this email directly or view it on GitHub https://github.com/dokan-dev/dokany/issues/26#issuecomment-132192473.
@Liryna Are you able to reproduce the isssue. I have started the DokanTest min. 25 times without crash.
@voltagex When you test the DokanTest.exe without debugger attached, are you able to reproduce the crash? When you are debugging, which method of the callbacks?
I'm sorry for causing trouble here. I can't cause the crash with my exe, but I can cause it by repeatedly unmounting and mounting the drive.
@marinkobabic I have made the same test as you. DokanTest more than 25 times with CTRL + C very fast or slowly. I got no crash and no zombie device. System was still stable after.
@voltagex If you can make a crash, please use WinDbg to see in which software it crashed.
Was the volume active while it was being dismounted?
Sent on a Sprint Samsung Galaxy Note® 3
-------- Original message -------- From: Liryna notifications@github.com Date: 08/18/2015 1:48 PM (GMT-06:00) To: dokan-dev/dokany dokany@noreply.github.com Subject: Re: [dokany] BSODs after DokanMounter service restart (#26)
@marinkobabichttps://github.com/marinkobabic I have made the same test as you. DokanTest more than 25 times with CTRL + C very fast or slowly. I got no crash and no zombie device. System was still stable after.
@voltagexhttps://github.com/voltagex If you can make a crash, please use WinDbg to see in which software it crashed.
Reply to this email directly or view it on GitHubhttps://github.com/dokan-dev/dokany/issues/26#issuecomment-132315910.
The fact is that a lot of people have the BSOD while debugging in Visual Studio. What is the difference when the debugger is used, compared to running the user mode application without attached debugger:
If debugger is used we slow down the request and other requests are not processed fast enough which results in more timed out Irp requests. If the developer detaches the debugger then the process terminates. The timeout thread realizes that the user mode application is no longer there and starts to clean up everything. Imagine that the timeout thread is collecting the timed out Irp requests and those are now in a list to complete. At the same time the driver removes all the other pending Irp requests and deletes the device and symbolic link. The timeout thread completes the Irp requests in a loop for a device which no longer exists and here we have an invalid Irp which is completed.
To simulate the timeout we could delay the response from user mode methods randomized to cause some of the Irp requests to time out. Then at some point we should exit the application. After several tries the application should crash. It’s actually just a theory but we can try to reproduce the issue this way, if the theory is correct.
Reproduced without the debugger. (updated this note to add unmounting step)
PROCESS_NAME: cmd.exe
CURRENT_IRQL: 0
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
LAST_CONTROL_TRANSFER: from fffff801a0b19f5c to fffff801a09c59a0
STACK_TEXT:
ffffd001`730cd368 fffff801`a0b19f5c : 00000000`000000c2 00000000`00000007 00000000`00001200 00000000`04070081 : nt!KeBugCheckEx
ffffd001`730cd370 fffff801`a0a10fbb : ffffe001`85722990 ffffe001`8406b6f0 00000000`c0000120 ffffd001`00000007 : nt!ExDeferredFreePool+0x6ec
ffffd001`730cd460 fffff801`a0cb508f : 00000000`00000085 ffffd001`730cd7b0 00000000`c0000120 ffffe001`8406b6f0 : nt! ?? ::FNODOBFM::`string'+0x3b10b
ffffd001`730cd490 fffff801`a0c60c39 : ffffc000`4a8146b8 ffffc000`4a8146b8 ffffc000`4bc110f0 ffffe001`8503fc20 : nt!IopParseDevice+0xbbf
ffffd001`730cd6b0 fffff801`a0c5ea63 : 00000000`00000000 ffffd001`730cd8a8 ffffe001`00000040 ffffe001`840aef20 : nt!ObpLookupObjectName+0x6b9
ffffd001`730cd830 fffff801`a0cd77ab : ffffe001`00000001 ffffe001`853810a8 00000000`00000001 00000000`00000020 : nt!ObOpenObjectByName+0x1e3
ffffd001`730cd960 fffff801`a0cd73b8 : 000000a4`1f5ff508 000000a4`00100020 000000a4`1f5ff488 fffff801`00001000 : nt!IopCreateFile+0x36b
ffffd001`730cda00 fffff801`a09d11b3 : fffff6fb`7dbed7f8 fffff6fb`7daffed0 fffff6fb`5ffdaa98 fffff6bf`fb5539f0 : nt!NtOpenFile+0x58
ffffd001`730cda90 00007ffa`bd750f7a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
000000a4`1f5ff418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffa`bd750f7a
@voltagex Perfect report ! I have been able to repoduce the issue (Win 8.1) but not the crash.
I have follow your steps, and after Make your application unmount the drive
M:\>dir
The parameter is incorrect.
M:\>c:
C:\Users\Liryna\Desktop\dokan>m:
The system cannot find the path specified.
BUT I still see the M driver in explorer.
Edit: After restart of the service, the device is still alive and the mirror is still connected to the device. The issue happen when the mirror is killed DokanRemoveMountPoint failed
. This could be fixed if during stop, the service would force to unmount.
@marinkobabic I totally agree with you. It is worth to try.
The timeout thread completes the Irp requests in a loop for a device which no longer exists and here we have an invalid Irp which is completed.
In which part of the code you think this happen ?
@marinkobabic Thank you Otherwise I have try to force unmount when the service is stop by adding:
//Force Unmount every device
EnterCriticalSection(&g_CriticalSection);
PLIST_ENTRY listEntry;
PMOUNT_ENTRY mountEntry = NULL;
for (listEntry = g_MountList.Flink; listEntry != &g_MountList; listEntry = listEntry->Flink) {
mountEntry = CONTAINING_RECORD(listEntry, MOUNT_ENTRY, ListEntry);
DbgPrintW(mountEntry->MountControl.MountPoint);
ZeroMemory(&unmount, sizeof(DOKAN_CONTROL));
unmount.Type = DOKAN_CONTROL_UNMOUNT;
wcscpy_s(unmount.DeviceName, sizeof(unmount.DeviceName) / sizeof(WCHAR),
mountEntry->MountControl.DeviceName);
DokanControl(&unmount);
}
LeaveCriticalSection(&g_CriticalSection);
The device is well unmount but the mirror is not notified of it and keep running. Do you have a idea why ?
@Liryna It's just a question of time until we will remove the service. To my opinion the service is not needed if the Mount Manager of windows is used.
If would not make to much dependencies to the service. An option you have is to let the driver to unmount the drive with force flag. What you need to do is to extend UNMOUNT_CONTEXT with Flags and set the force unmount flag here https://github.com/dokan-dev/dokany/blob/master/sys/timeout.c#L64. The mounter service will take the EVENT_CONTEXT and set the flag here after this line. https://github.com/dokan-dev/dokany/blob/master/dokan_mount/mounter.c#L391
@marinkobabic I have try what you say, but since the service has lost all mount informations during the restart. FindMountEntry cannot retrieve the MountPoint from the DeviceName so the Unmount, even with force flag, fail. https://github.com/dokan-dev/dokany/blob/master/dokan_mount/mounter.c#L74
I totally agree with you, the service seems to be useless compared to the issue that it create. Have you already used Mount Manager ?
@Liryna This is not possible what you have described. Please check the following lines https://github.com/dokan-dev/dokany/blob/bdc64b0cdce4f6a2b7b4a046c7eca73818d378ed/dokan_mount/mounter.c#L179-L181 If the entry is not found and the force flag is set, an unmount will be performed.
MountManager requires Plug & Play implementation. By the way, can you open pdf files on Windows 8.1 using the native pdf viewer and not the Adobe Reader?
@marinkobabic DokanControlUnmount(Control->MountPoint)
MountPoint is empty :cry: only DeviceName is set by the sys driver.
(Sorry my first description was misleading)
Can we get the MountPoint at this part of the code ? https://github.com/dokan-dev/dokany/blob/master/sys/timeout.c#L61
I just tested with the native pdf viewer: "There is a problem with the file format."
@Liryna You are right :disappointed_relieved: so this way an unmount is not possible. A device can have multiple mountpoints and the service has lost all information after the restart. In this case you can just loop over all existing drives and perform a QueryDosDevice until you find the target device and then you can set the MountPoint and perform the unmount.
The reason you can't open the pdf file is that since Windows 8 a lot of programs rely on OpLocks https://msdn.microsoft.com/en-us/library/windows/hardware/ff551007(v=vs.85).aspx . The implementation is no longer optional for File System Drivers. Keep it in mind, when somebody is not able to play/open some formats. The check what requests are sent you can use the process monitor or the filespy https://www.osronline.com/article.cfm?article=370
As you can see there is a lot to do. The first stuff which should be done is to change the DriverEntry like in fastfat example and to catch all exception on one place.
@marinkobabic Ok I will add QueryDosDevice in FindMountEntry later.
What do you mean by catching all exception on one place ?
I agree, we are discovering that dokan need a lots of changes to achieve his goal. I create a new issue with a TODO list by priority, I will add every changes we found interesting and keep it update. Feel free to suggest any ideas. https://github.com/dokan-dev/dokany/issues/45
Hi guys! I've been investigating a BSOD issue in dokanx fork, than I found out about dokany fork, checked if that issue is reproduced here and it does, so I thinks you guys should know about it too. Here is the summary: When you mount your fs on some drive letter, restart DokanMounter service and then you try to kill that fs-app, the BSOD occurs. Here is full description (many letters): https://github.com/BenjaminKim/dokanx/issues/47 Hope to get any comments and/or suggestions. Thanks.