dokan-dev / dokany

User mode file system library for windows with FUSE Wrapper
http://dokan-dev.github.io
5.21k stars 661 forks source link

insufficient system resources exist to complete the requested service #1058

Closed onexzero closed 2 years ago

onexzero commented 2 years ago

Feature request can skip this form. Bug report must complete it. Check List must be 100% match or it will be automatically closed without further discussion. Please remove this line.

Environment

Check List

Description

"insufficient system resources exist to complete the requested service" appears when opening .txt file with notepad. I ran mirror.exe without /n switch as follow. mirror.exe /r C:\Users\raccoon\Desktop\temp /l r:

And that message appeared low probability(about 1/15) when I did open /close repeatedly. So i tested again with dokany 1.5.1 but It worked well even on more slow my own driver with /n switch. No such message poped up. I tested dokany latest version various way. and found that it's more faster than 1.5.x. So I planed to migrate to latest version but this problem occured again. And the ms-office crashing problem a few days ago I issued is solved. Root cause is that dokanOperations.FindStreams returns STATUS_SUCCESS when did not implemented, this makes powerpoint-crash on save or save-as. I think it should be fixed. And now my driver works well on dokany 2.0.1.2000 except this problem.

Logs

Please attach in separate files: mirror output, library logs and kernel logs. In case of BSOD, please attach minidump or dump analyze output.

Liryna commented 2 years ago

@onexzero can you try with the new release https://github.com/dokan-dev/dokany/releases/tag/v2.0.2.1000 ?

onexzero commented 2 years ago

Sure.

onexzero commented 2 years ago

@Liryna

The newly released version of the test ended without any errors. However, the #1063 performance degradation problem actually appeared. When opening a 10MB ppt file, v2.x is about 2 seconds slower on average than v1.5 (actually, this is the result obtained by testing more than 50 times). There is also a big difference in file copy speed. The attached video is the result of copying the size of about 100MB from my own driver. I repeated the tests several times and got similar results.

https://user-images.githubusercontent.com/31300523/152760585-731ccf3a-02f6-4d63-be86-efa74d04475d.mp4

https://user-images.githubusercontent.com/31300523/152760608-78b8e9c6-a5ce-49af-bf80-e917cc780897.mp4

Liryna commented 2 years ago

@onexzero Are you able to do the same test with you VM having 4 core ? Please use the mirror sample like that we use the same base to compare.

Liryna commented 2 years ago

I am using a VM with two cores. Copy a 1 gb file using V1 mirror gives me 500mb/s but V2 is about 600mb/s

onexzero commented 2 years ago

I'll try it tomorrow.

onexzero commented 2 years ago

4-Core test results are here. I copied vm and reinstalled dokan on each vm. As you can see, there is actual performance drop.

https://user-images.githubusercontent.com/31300523/152902475-43c2cac0-fd93-4e70-b5c2-2573096680ca.mp4

https://user-images.githubusercontent.com/31300523/152902501-5111ade5-6b0c-4688-9fc3-72a72881d7eb.mp4

Liryna commented 2 years ago

I guess the first video is dokan v1 and the second v2 ? Is it always the first copy after mount that is faster on v2 ? Would you be able to debug and find out what is the difference ? Like how are the threads and what are they doing ?

onexzero commented 2 years ago

Dokan1.mp4 is dokan v1. Dokan2.mp4 is dokan v2. Dokan1 is fast in almost all cases.

I'll try to find diff. What i have felt from the test so far is that Dokan1 is more responsive.

Liryna commented 2 years ago

@onexzero Try to play with the value of https://github.com/dokan-dev/dokany/commit/fb8aee08dd56053dfb39e5d219945ce31db423ae Normally I would say 2 might be the best value for a vm of 4 cores

onexzero commented 2 years ago

The performance seems to have improved a little, but it is much slower than Dokan1. As a result of testing with my own driver, there is not much difference.

https://user-images.githubusercontent.com/31300523/152917949-ef074a7c-1635-4489-8cb0-75ad46bc4d4a.mp4

onexzero commented 2 years ago

I tested it several times over and over again. The test conducted with Dokan samples has a performance difference of about 5-20%, but when tested with my driver, the speed difference is more than 10 times.

onexzero commented 2 years ago

I monitored the file copying process in my driver with Procmon. If you look at ReadFile's Duration, you can see that the results are very different in Dokan1 and Dokan2. Except for the Dokan version, all environments are in the same state. ProcmonLog.zip

Liryna commented 2 years ago

@onexzero could you please redo your test with this change https://github.com/dokan-dev/dokany/commit/231fc23f5e0581aa5b401c838bebf72bef868e9d ? (it is not in the branch master yet)

onexzero commented 2 years ago

No problem.

onexzero commented 2 years ago

It may be a feeling, but it seems to have improved performance a little.

https://user-images.githubusercontent.com/31300523/153103612-d039fa7c-3089-4ce2-a7d3-ce6bb948dfe8.mp4

https://user-images.githubusercontent.com/31300523/153103646-6b8aaff1-f10b-464a-86ec-2e9aecbd8a23.mp4

Liryna commented 2 years ago

@onexzero are you able to start the mirror in debug from Visual Studio and find out what is taking time ?

onexzero commented 2 years ago

Refer to PorcmonLog above. It's my driver log, but it's not much different from the Dokan sample. According to my analysis, each API takes longer to perform than Dokan1. Rather than taking a lot of time in a particular part, each API execution time seems to be slightly longer, so there seems to be a difference. It's hard right now, but if I have time, I'll try profiling.

onexzero commented 2 years ago

In real world, opening and saving speed are more important than copying speed. The mirror sample is a good guide, but it is not realistic. Because what people are mainly trying to do with Dokan is to map the file systems of Web, DB, and Cloud, so I think it is more important to reduce the latency of each API.

onexzero commented 2 years ago

Profiling report collected on real machine. Report20220209-1643.zip

onexzero commented 2 years ago

Profiling result on VM vmtest.zip cpu.zip

Liryna commented 2 years ago

@onexzero I pushed new changes that overall improves the read and write scenario.

onexzero commented 2 years ago

Ok. I'll test it.

Liryna commented 2 years ago

@onexzero There is no version. All changes are in the library. You can just pull the new changes and build if you want to test them.

onexzero commented 2 years ago

Ok. I'll test it.

Liryna commented 2 years ago

See https://github.com/dokan-dev/dokany/releases/tag/v2.0.3.1000

onexzero commented 2 years ago

Excellent work! Performance has improved remarkably. I've tested several things and haven't found any issues yet.

Liryna commented 2 years ago

Thanks for the feedback @onexzero 👍 Glad we got it solved!

onexzero commented 2 years ago

@Liryna hi~ I tested the latest version (2.0.4.1) module today, but it seems that this issue has not been completely resolved. I've been repeatedly opening mp4 video files on my drive, and found that in some cases the video takes significantly longer than usual to play. Additionally, i tested other types of files and found that there were occasional, time-consuming cases. A file that normally opened in 2-3 seconds took more than 20-30 seconds. When I checked the process monitor, "Insuffient resource..." is also occurring. I have attached the relevant log and PNG file. If you look at the PNG file, you will find some strange parts. This issue doesn't seem to be caused by the latest version of the module. I've tried with the previous version, but the same thing happens.

Strange.zip

Liryna commented 2 years ago

The driver logs looks fine. The Create request around before the gap between 26 and 38 are completed at 26 and we see the library logs. The Create failing with INSUFFICIENT_RESOURCES is also happening quickly.

Can this be repro with samples ? on any computers ?

onexzero commented 2 years ago

It's very difficult to reproduce with a sample, but I'll give it a try. However, this problem appears to be the same as the one discussed above. https://github.com/dokan-dev/dokany/issues/1058#issuecomment-1024942215

Liryna commented 2 years ago

Even if the result looks the same, I am not able to see the same trace/events in the driver logs.

onexzero commented 2 years ago

All operations before and after INSUFFICIENT_RESOURCES were taking a long time to complete. CloseFile also took 18 seconds. What was the cause? I checked the server side logs, but couldn't find the part that took that long. It looks like there is some kind of problem with LOCK.

Liryna commented 2 years ago

From the logs you provided, the close happens in the library at the right time. When you say lock issue it is a guess or you located it?

onexzero commented 2 years ago

C:\Users\raccoon\Desktop\mirror>mirror /r C:\Users\raccoon\Desktop\Temp /l r /e /d /n \localhost\dokan

Contrary to what I thought, it was reproduced well with the Dokan sample. If you look around 11:37:42, CloseFile takes a strangely long time and you can see that the error occurs as in my driver.

dokansample.LOG dokansample.zip

onexzero commented 2 years ago

Regarding LOCK, it's just speculation. It's just a feeling, so don't worry too much.

onexzero commented 2 years ago

Same result on local drive emulation mode. mirror /r C:\Users\raccoon\Desktop\Temp /l r /e /d

localdrive.zip

onexzero commented 2 years ago

I think I've found a pattern for this problem. It was strange that SendAndPullEventInformation() appeared in two lines in the log whenever INSUFFICIENT_RESOURCE occurred during the test. Please see the attached PNG. It's a guess, but there seems to be some racing problem. pattern.zip

onexzero commented 2 years ago

In programs that retry when an error occurs, such as Windows movie player, responsiveness can be improved by adjusting the timeout value of DokanOption and the value of dokan_check_interval, but it is still a problem in programs that immediately popup errors such as Notepad.exe. It seems that we need to find the root cause of the INSUFFICIENT_RESOURCE problem. I don't know why I get INSUFFICIENT_RESOURCE so often on 4-core machines.

onexzero commented 2 years ago

Finally found the root cause. The pull thread count is set too low. When I edit the source in 4-Core VM and take a log, it comes out as 2. I don't think this was originally intended. I forced it to 8 and the problem went away.

Liryna commented 2 years ago

Normally we should detect your 4 Core and set 4 threads https://github.com/dokan-dev/dokany/blob/master/dokan/dokan.c#L778-L783 2 threads would mean whether you have 1 Core or 2 Core detected and we enforce DOKAN_MAIN_PULL_THREAD_COUNT_MIN

onexzero commented 2 years ago

Matching the number of threads to the number of cores may seem ideal, but according to actual test results, it is not enough. That is, the error occurs intermittently even when set to 4-thread. I think several factors such as core speed are related to this problem. How about making it configurable as in the previous version?

Liryna commented 2 years ago

@onexzero I think the issue might be different. Could you reset all your local changes and give a try to https://github.com/dokan-dev/dokany/commit/a31981dae12f22a39ea13b39bb02eb5d79e5f0b3 ? The create request being canceled should not wait for that interval that you also manually changed. FYI I wasn't able to find the issue in https://github.com/dokan-dev/dokany/issues/1058#issuecomment-1132498906 logs, please continue to test on network drive for now and lets see after the local type

onexzero commented 2 years ago

This is the result of applying the patch and testing it. I don't know much of a difference. As you may know, the reason I changed DOKAN_CHECK_TIME when the root cause was unknown was to reduce recovery time when an error occurred. result

onexzero commented 2 years ago

"The create request being canceled should not wait for that interval that you also manually changed. "

What does this sentence mean? I know that if the pending IRP is not processed within the timeout time, it is canceled and returned as an INSUFFICIENT error, did I misunderstand? I analyzed the log for the case of emulation with a local drive, but this case is different. In this case, it seems that the INSUFFICIENT_BUFFER error caused the delay. From the test results, even in this case, as you said, it looks like the process is waiting instead of returning immediately. Are you talking about this case?

I tested the patch in local drive mode using the Dokan sample. There was no lag when running the player. However, the process monitor has the following logs: cancelled

Liryna commented 2 years ago

Could you provide the exact app, windows version and steps so I can reproduce the issue on my side ?

It would be interesting to know whether it is canceled by https://github.com/dokan-dev/dokany/blob/6bd609be33ae58196e56641095a58a03dd88f682/sys/event.c#L32 or https://github.com/dokan-dev/dokany/blob/e9f4bef38cae47e5a788044a4aa88b0703e5ec54/sys/create.c#L329 because is does not looks like it is a timeout if you can make the create return faster by changing the DOKAN_CHECK_INTERVAL Unfortunately there is no logs in those functions 😢 (Also if you add logs there, they might not be forward to userland, it would need to have the "real" debugger attach to see them)

Do you see this log in your system EventLog https://github.com/dokan-dev/dokany/blob/2352816c996537e90a5786b84219b708f6cfeb02/sys/timeout.c#L255 ?

onexzero commented 2 years ago

Windows 10 pro 32bit 19044 (run on 4-Core VM)

  1. mirror /r C:\Users\raccoon\Desktop\Temp /l r
  2. Double-click the mp4 file in r: -> Movies & TV app launch That's all.

"Wake from sleep detected." was not found in the event log.

The attempt to improve response time by reducing DOKAN_CHECK_TIME and Timeout was based on the following log. Results are already at https://github.com/dokan-dev/dokany/issues/1058#issuecomment-1134061789

00048074    533.45446777    [1656] DriverLog: [11:45:13.848][DokanDispatchRequest][FFFF96831B6BC8A0][VCB][IRP_MJ_CREATE][]: Begin ProcessId=6236    
00048075    533.45544434    [1656] DriverLog: [11:45:13.848][DokanDispatchCreate][FFFF96831B6BC8A0]: FileObject=FFFF968319F24590 RelatedFileObject=0000000000000000 FileName="\" Flags=0 DesiredAccess=80 Options=1200000 FileAttributes=0 ShareAccess=7    
00048076    533.45642090    [1656] DriverLog: [11:45:13.848][DokanCompareFcb]: First: FFFF96831BEC9D50 \ Second: FFFF96831A80A010 \drive_fs_notification - Result: -21  
00048077    533.45782471    [1656] DriverLog: [11:45:13.848][DokanGetFCB][FFFF96831B6BC8A0]: New FCB FFFF96831BEC9D50 allocated for \   
00048078    533.45886230    [1656] DriverLog: [11:45:13.848][DokanDispatchCreate][FFFF96831B6BC8A0]: Use FCB=FFFF96831BEC9D50   
00048079    533.45977783    [1656] DriverLog: [11:45:13.848][DokanAllocateCCB][FFFF96831B6BC8A0]: Allocated CCB=FFFF96831DF03E00    
00048080    533.46081543    [1656] DriverLog: [11:45:13.848][DokanRegisterPendingIrp][FFFF96831B6BC8A0]: Register the IRP pending   
00048081    533.46173096    [1656] ###Create file handle = 0x00000238846518B0, eventID = 0452, event Info = 0x00000238845EFF40  
00048082    533.46276855    [1656] DriverLog: [11:45:13.848][DokanRegisterPendingIrp][FFFF96831B6BC8A0]: Pending Registration: STATUS_PENDING   
00048083    533.46386719    [1656] CreateFile : C:\Users\raccoon\Desktop\Temp\  

> 00048084  533.46484375    [1656] DriverLog: [11:45:13.848][DokanDispatchRequest][FFFF96831B6BC8A0]: End - Irp is marked pending   

00048085    533.46588135    [1656] DokanError: Ioctl 0x92030 failed with code 87 on Device \\.\DokanRedirector2{d6cc17c5-1734-4085-bce7-964f1e9f5de9}   

> 00048086  533.46691895    [1656] DriverLog: [11:45:18.754][ReleaseTimeoutPendingIrp]: Start   
> 00048087  533.84246826    [1656] DriverLog: [11:45:18.754][ReleaseTimeoutPendingIrp]: End

00048088    533.84429932    [1656]   DokanOpenRequestorToken failed 

> 00048089  533.84539795    [1656] DriverLog: [11:45:23.739][ReleaseTimeoutPendingIrp]: Start   
> 00048090  533.84637451    [1656]  ShareMode = 0x7 
> 00048091  533.84747314    [1656] DriverLog: [11:45:23.739][ReleaseTimeoutPendingIrp]: End

00048092    533.84844971    [1656]  FILE_SHARE_READ 
00048093    533.84942627    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0][DGL][IRP_MJ_CREATE][]: Begin ProcessId=4724    
00048094    533.85052490    [1656]  FILE_SHARE_WRITE    
00048095    533.85162354    [1656] DriverLog: [11:45:23.848][DokanDispatchCreate][FFFF96831A2F02C0]: FileObject=FFFF96831D651DD0 RelatedFileObject=0000000000000000 FileName="(null)" Flags=0 DesiredAccess=100080 Options=1000060 FileAttributes=0 ShareAccess=3   
00048096    533.85260010    [1656]  FILE_SHARE_DELETE   
00048097    533.85400391    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0]: End - STATUS_SUCCESS Information=0    
00048098    533.85498047    [1656]  DesiredAccess = 0x80    
00048099    533.85595703    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0][DGL][IRP_MJ_FILE_SYSTEM_CONTROL][IRP_MN_USER_FS_REQUEST]: Begin ProcessId=4724 
00048100    533.85693359    [1656]  FILE_READ_ATTRIBUTES    
00048101    533.85797119    [1656] DriverLog: [11:45:23.848][DokanUserFsRequest][FFFF96831A2F02C0][FSCTL_EVENT_MOUNTPOINT_LIST]: FileObject=FFFF96831D651DD0    
00048102    533.85913086    [1656]  FlagsAndAttributes = 0x200000   
00048103    533.86016846    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0]: End - STATUS_SUCCESS Information=314  
00048104    533.86114502    [1656]  FILE_FLAG_OPEN_REPARSE_POINT    
00048105    533.86218262    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0][DGL][IRP_MJ_CLEANUP][]: Begin ProcessId=4724   
00048106    533.86315918    [1656]  OPEN_EXISTING   
00048107    533.86413574    [1656] DriverLog: [11:45:23.848][DokanDispatchCleanup][FFFF96831A2F02C0]: FileObject=FFFF96831D651DD0   
00048108    533.86511230    [1656]  
00048109    533.86614990    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0]: End - STATUS_SUCCESS Information=0    
00048110    533.86712646    [1656] Dokan Information: DokanEndDispatchCreate() status = 0, file handle = 0x00000238846518B0, eventID = 0452, result = 0x1   
00048111    533.86810303    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0][DGL][IRP_MJ_CLOSE][]: Begin ProcessId=4724 
00048112    533.86914063    [1656] Dokan Information: SendAndPullEventInformation() with NTSTATUS 0x0, context 0x846518b0, and result object 0x000002388464D490 with size 48    
00048113    533.87011719    [1656] DriverLog: [11:45:23.848][DokanDispatchClose][FFFF96831A2F02C0]: FileObject=FFFF96831D651DD0 
00048114    533.87109375    [1656] DriverLog: [11:45:23.848][DokanDispatchRequest][FFFF96831A2F02C0]: End - STATUS_SUCCESS Information=0    

> 00048115  533.87213135    [1656] DriverLog: [11:45:28.739][ReleaseTimeoutPendingIrp]: Start   
> 00048116  533.87310791    [1656] DriverLog: [11:45:28.739][ReleaseTimeoutPendingIrp]: End 

> 00048117  533.87438965    [1656] DriverLog: [11:45:33.741][ReleaseTimeoutPendingIrp]: Start   
> 00048118  533.87542725    [1656] DriverLog: [11:45:33.741][DokanCompleteCreate][FFFF96831B6BC8A0]: FileObject=FFFF968319F24590 CreateInformation=FILE_SUPERSEDED  
> 00048119  533.87634277    [1656] DriverLog: [11:45:33.741][DokanFreeCCB][FFFF96831B6BC8A0]: Free CCB=FFFF96831DF03E00 
> 00048120  533.87738037    [1656] DriverLog: [11:45:33.741][DokanCancelCreateIrp][FFFF96831B6BC8A0]: End - STATUS_INSUFFICIENT_RESOURCES Information=0 
> 00048121  533.87835693    [1656] DriverLog: [11:45:33.741][ReleaseTimeoutPendingIrp]: End