dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

support file block copy on write on Windows #86681

Open danmoseley opened 1 year ago

danmoseley commented 1 year ago

On ReFS, Windows supports a block copy operation, like copy on write but at a sub-file level.

https://learn.microsoft.com/en-us/windows/win32/fileio/block-cloning https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_duplicate_extents_to_file

This has existed for several years, but is suddenly more interesting now because of the announcement of dev drive.

We should enlighten File.Copy to use this on Windows when possible, so that MSBuild, Nuget etc work extra fast on dev drive.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
On ReFS, Windows supports a block copy operation, like copy on write but at a sub-file level. https://learn.microsoft.com/en-us/windows/win32/fileio/block-cloning https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_duplicate_extents_to_file This has existed for several years, but is suddenly more interesting now because of the announcement of [dev drive](https://learn.microsoft.com/en-us/windows/dev-drive/). We should enlighten File.Copy to use this on Windows when possible, so that MSBuild, Nuget etc work extra fast on dev drive.
Author: danmoseley
Assignees: -
Labels: `area-System.IO`
Milestone: -
danmoseley commented 1 year ago

@hamarb123 if you comment on this issue, I'll be able to assign it to you.

hamarb123 commented 1 year ago

Comment :)

danmoseley commented 1 year ago

cc @rainersigwald this should speed up MSBuild if it isn't already doing hardlinks. cc @aortiz-msft this should help Nuget, but only where it's doing File.Copy or equivalent -- it looks like you do FileStream copy in some places which wouldn't benefit .. we might want to take a look at that at some point to see whether it's necessary.

rainersigwald commented 1 year ago

MSBuild has an option to do hardlinks (the Copy task's UseHardlinksIfPossible parameter, exposed in common targets as $(CreateHardLinksForCopyFilesToOutputDirectoryIfPossible)) but you have to be very careful with them since modifying (what looks like) a file in your output can silently corrupt your NuGet cache or other files, so they're not widely used.

danmoseley commented 1 year ago

sharing today's post for context -- https://devblogs.microsoft.com/engineering-at-microsoft/dev-drive-and-copy-on-write-for-developer-performance/

jkoritzinsky commented 1 year ago

There's a Copy On Write MSBuild SDK that utilizes the ReFS CloneFile API for a custom implementation of the Copy MSBuild task. Might be useful.

danmoseley commented 1 year ago

Yep, I think @hamarb123 is going to crib from that (https://github.com/microsoft/CopyOnWrite)

hamarb123 commented 1 year ago

I have been delayed in making a PR for this due to BSODs caused by my code/Windows, which were hard to track down. I've now found the root cause, and documented it at: https://github.com/microsoft/CopyOnWrite/issues/24. Hopefully I can make a PR with a workaround soon.

danmoseley commented 1 year ago

@hamarb123 how's it looking - I see they're fixing the BSOD (nice find)

hamarb123 commented 1 year ago

@danmoseley I've been working on some other projects this last week, I plan to have a go at getting a working PR ready tomorrow - assuming there's no other issues like this one, it should be do-able. Thanks

danmoseley commented 1 year ago

@hamarb123 Sounds good - I actually didn't intend to hurry you.. was just curious

danmoseley commented 1 year ago

@hamarb123 actually I'm now starting to have my eye on platform shutdown date for .NET 8. I believe this repo is about to stop feature work, but historically have allowed community contributions at all times into main, occasionally holding risky commits until after branching. Do you expect to have a PR soon? If not that's totally fine but I may ask around to see whether there's someone who can look nearer term.

hamarb123 commented 1 year ago

@danmoseley sorry, it slipped my mind. I have mostly functional code, will make a PR after confirming the code still currently works locally.

huoyaoyuan commented 1 year ago

ReFS filesystem Block Cloning Support is now available in the Windows copy engine. (https://blogs.windows.com/windows-insider/2023/10/25/announcing-windows-11-insider-preview-build-25982-canary-channel/)

What does this mean for, copy action in explorer (likely not), or the CopyFile syscall? Do we still need explicit support in .NET?

hamarb123 commented 1 year ago

@erikmav do you know? ^

huoyaoyuan commented 1 year ago

so this feature adds native support to copy actions and APIs on Windows.

It sounds like a new syscall available?

erikmav commented 1 year ago

Not a new syscall, the Windows team liked the CoW approach and numbers from the .NET approach (https://aka.ms/EngMSDevDrive) and I've been working with them to turn it on by default in the CopyFile(Ex) API calls. Unless something goes wrong between now and release of this Canary feature to cause the code to change, on Dev Drive (and ReFS) copying a file will just clone instead.

I'm partway through perf testing the canary bits on internal repos with and without the .NET package turned on. Keep an eye on the Eng@MS blog for more info as soon as next week.

erikmav commented 1 year ago

Assuming the feature makes it to the February release without semantic changes, it would resolve this issue. Better to wait to resolve this until it's known that clone is on by default in the final release.

erikmav commented 1 year ago

FYI latest in blog series (encapsulates what's already here): https://devblogs.microsoft.com/engineering-at-microsoft/copy-on-write-in-win32-api-early-access/

huoyaoyuan commented 1 month ago

I updated my machine to 24H2 recently, and the plain copy command seems to be CoW cloning on regular ReFS drive (not Dev Drive). Doing it in the syscall itself would probably have better performance.

Do we still need this functionality? It will only benefit a limited set of configurations, and not required for "the future".

hamarb123 commented 1 month ago

No, I don't think so - I assumed the conclusion was that we will just benefit from OS support when available, and not worry about the other cases, since they're rare anyway.

If it's still wanted, I could have a go at implementing it some time, but I had just assumed it wasn't needed anymore.