dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.45k stars 4.76k forks source link

[API Proposal]: Extend Memory Mapped Files support #59776

Open msedi opened 3 years ago

msedi commented 3 years ago

Background and motivation

Currently, the MemoryMappedFile API is great but doesn't expose some properties that are available in the WinAPI. While I understand that also other platforms are currently supported it would be nice to find a common way to extend the API.

The suggestions are:

Enable FileOptions Currently when using MemoryMappedFile.CreateFromFile no FileOptions property is allowed to be handed over to this method. Some of the enums in FileOptions may be useless, some not (e.g. FileOptions.DeleteOnClose). Currently there exists a MemoryMappedFileOptions that could be extended so that the options can be abstracted rather than using FileOptions directly.

Large Page Support It is not possible to use Large_Pages, from the WinAPI I can see that it is only possible in system paging files and not in file backed paging file, but I don't have too much experience on that.

More control over memory It's currently not possible to invalidate memory (DiscardVirtualMemory) so that the memory manager can ignore these areas and will not write it back to the paging file. Additionally it is currently not possible to mark pages as "not in use" (VirtualUnlock) so that the memory manager is able to page them earlier. Also it is not possible to prefetch pages (PrefetchVirtualMemory).

Better backing file control It seems that some options but I do not have good benchmarks allow for better performance. SetFileValidData and Sparse File Support are the keywords. My current tests showed that using sparse files, had an improvement from 150s to 100s.

Flush areas FlushViewOfFile allows more control of which areas are flushed and should be available in MemoryMappedViewAccessor.Flush.

Do not flush the view on disposal Currently the filestream is flushed when disposing the memory mapped structures, which are causing enormous performance drop when the file is opened with FileOptions.DeleteOnClose. Since the file is deleted on disposal it doesn't really make sense to flush it to the backing file.

Control over the working set While working with memory mapped files I was produced a tremendous amount of memory and came very quickly to a point where the memory was exhausted and the memory manager started to page and to offload the data to the pagefile. While this behavior is OK and is dependent on the OS (I was told that linux handles memory mapped files better?), it started way too late so I built my own "unmanaged GC" which watched in a thread over the memory mapped files. The problem was though that even when I unlocked the memory region (VirtualUnlock) the workingset was still high so I need to enforce a flush of the working set. All of the Virtual* methods are non-blocking though and I haven't found a good way to wait until the workingset reached a better condition. Calling EmptyWorkingSetis maybe not the best solution here (and its also non-blocking). Some other approaches would be welcome. I also assume that putting EmptyWorkingSet in the API proposal would cause unwanted effects. But I have put it here for discussion.

It would be good if the API methods are abstracted somehow and the true API is not exposed. It is of course welcome to discuss this proposal.

Also, I'm not too advanced in memory mapped files.

API Proposal

namespace System.IO.MemoryMappedFiles
{
it.
    // With this approach nothing has to be changed on the existing factory routines (to be discussed).
    [Flags]
    public enum MemoryMappedFileOptions
    {
        None = 0,
        // Proposal to create a create a MemoryMappedFile with the DeleteOnClose flag so that the Dispose routines can respect         
        DeleteOnClose = 0x2,
        // Additionally add an enum to support large pages.
        LargePageSupport = 0x4,
        // Allocates a sparse file
        Sparse = 0x08,
        // Set the valid data size
        ValidData = 0x10,
        DelayAllocatePages = 0x4000000
    }

    public class MemoryMappedViewAccessor
    {
       // Flushes the given region (of course a modulo of the page size)
       // While it is only a hint to the memory manager, with flush to disk the flush is enforced.
      // see FlushViewOfFile and FlushFileBuffers
       public void Flush(nint start, nint length, bool flushToDisk = false);

       // Invalidates the given region (of course a modulo of the page size) and makes it free to the memory manager
       // see DiscardVirtualMemory
       public void Discard(nint start, nint length);

       // Unlocks a given memory region and tells the memory manager that the region can be paged.
       // see VirtualUnlock
       public void Free(nint start, nint length);

       // Advises the memory manager to prefetches the given memory region.
       public void Prefetch(nint start, nint length);

       // I'm  not sure if following two methods make sense with memory mapped files. 
       public void Offer();
       public void Reclaim();
    }
}

API Usage


var mm = MemoryMappedFile.CreateFromFile("file.dat", FileMode.CreateNew, null, 10000, MemoryMappedFileAccess.ReadWrite, MemoryMappedFileOptions.DeleteOnClose);

using var va = mm.CreateViewAccessor();

// Prefetch all 10000 bytes
va.Prefetch(0, 10000);

// Discard all 10000 bytes and tell the memory manager that no flush on this area is needed
va.Discard(0, 10000);

// Free 9000 bytes and tell the memory manager that these region is currently not in use and can be paged to the disk if nedded
va.Free(1000, 10000);

// Flush all 10000 bytes and wait until they are written to disk
va.Flush(0, 10000, true);

### Risks

The risks are of course if memory mapped files need to be platform independent to find proper correspondence in the other OSes.

There are for sure things I didn't get correctly so please feel free to correct me on my mistakes ;-)
ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation Currently, the MemoryMappedFile API is great but doesn't expose some properties that are available in the WinAPI. While I understand that also other platforms are currently supported it would be nice to find a common way to extend the API. The suggestions are: *Enable FileOptions* Currently when using MemoryMappedFile.CreateFromFile no FileOptions property is allowed to be handed over to this method. Some of the enums in FileOptions may be useless, some not (e.g. FileOptions.DeleteOnClose). Currently there exists a MemoryMappedFileOptions that could be extended so that the options can be abstracted rather than using FileOptions directly. *Large Page Support* It is not possible to use Large_Pages, from the WinAPI I can see that it is only possible in system paging files and not in file backed paging file, but I don't have too much experience on that. *More control over memory* It's currently not possible to invalidate memory (DiscardVirtualMemory) so that the memory manager can ignore these areas and will not write it back to the paging file. Additionally it is currently not possible to mark pages as "not in use" (VirtualUnlock) so that the memory manager is able to page them earlier. Also it is not possible to prefetch pages (PrefetchVirtualMemory). *Better backing file control* It seems that some options but I do not have good benchmarks allow for better performance. SetFileValidData and Sparse File Support are the keywords. My current tests showed that using sparse files, had an improvement from 150s to 100s. *Flush areas* FlushViewOfFile allows more control of which areas are flushed and should be available in MemoryMappedViewAccessor.Flush. *Do not flush the view on disposal* Currently the filestream is flushed when disposing the memory mapped structures, which are causing enormous performance drop when the file is opened with FileOptions.DeleteOnClose. Since the file is deleted on disposal it doesn't really make sense to flush it to the backing file. *Control over the working set* While working with memory mapped files I was produced a tremendous amount of memory and came very quickly to a point where the memory was exhausted and the memory manager started to page and to offload the data to the pagefile. While this behavior is OK and is dependent on the OS (I was told that linux handles memory mapped files better?), it started way too late so I built my own "unmanaged GC" which watched in a thread over the memory mapped files. The problem was though that even when I unlocked the memory region (VirtualUnlock) the workingset was still high so I need to enforce a flush of the working set. All of the Virtual* methods are non-blocking though and I haven't found a good way to wait until the workingset reached a better condition. Calling `EmptyWorkingSet `is maybe not the best solution here (and its also non-blocking). Some other approaches would be welcome. I also assume that putting EmptyWorkingSet in the API proposal would cause unwanted effects. But I have put it here for discussion. It would be good if the API methods are abstracted somehow and the true API is not exposed. It is of course welcome to discuss this proposal. Also, I'm not too advanced in memory mapped files. ### API Proposal ```C# namespace System.IO.MemoryMappedFiles { it. // With this approach nothing has to be changed on the existing factory routines (to be discussed). [Flags] public enum MemoryMappedFileOptions { None = 0, // Proposal to create a create a MemoryMappedFile with the DeleteOnClose flag so that the Dispose routines can respect DeleteOnClose = 0x2, // Additionally add an enum to support large pages. LargePageSupport = 0x4, // Allocates a sparse file Sparse = 0x08, // Set the valid data size ValidData = 0x10, DelayAllocatePages = 0x4000000 } public class MemoryMappedViewAccessor { // Flushes the given region (of course a modulo of the page size) // While it is only a hint to the memory manager, with flush to disk the flush is enforced. // see FlushViewOfFile and FlushFileBuffers public void Flush(nint start, nint length, bool flushToDisk = false); // Invalidates the given region (of course a modulo of the page size) and makes it free to the memory manager // see DiscardVirtualMemory public void Discard(nint start, nint length); // Unlocks a given memory region and tells the memory manager that the region can be paged. // see VirtualUnlock public void Free(nint start, nint length); // Advises the memory manager to prefetches the given memory region. public void Prefetch(nint start, nint length); // I'm not sure if following two methods make sense with memory mapped files. public void Offer(); public void Reclaim(); } } ``` ### API Usage ```C# var mm = MemoryMappedFile.CreateFromFile("file.dat", FileMode.CreateNew, null, 10000, MemoryMappedFileAccess.ReadWrite, MemoryMappedFileOptions.DeleteOnClose); using var va = mm.CreateViewAccessor(); // Prefetch all 10000 bytes va.Prefetch(0, 10000); // Discard all 10000 bytes and tell the memory manager that no flush on this area is needed va.Discard(0, 10000); // Free 9000 bytes and tell the memory manager that these region is currently not in use and can be paged to the disk if nedded va.Free(1000, 10000); // Flush all 10000 bytes and wait until they are written to disk va.Flush(0, 10000, true); ### Risks The risks are of course if memory mapped files need to be platform independent to find proper correspondence in the other OSes. There are for sure things I didn't get correctly so please feel free to correct me on my mistakes ;-)
Author: msedi
Assignees: -
Labels: `api-suggestion`, `area-System.IO`, `untriaged`
Milestone: -
msedi commented 1 year ago

Our team would be very interested in some more discussions about this topic. Would there be a chance to do so? We could even help in improving this, but we would need some agreement and further discussions. Since thare some interest in this topic I'll list (incomplete) them here for reference #59606, #57330, #37227, #59405. #62768, #69365, #48793, #941, #24990, #24805. Many of them are still open, many of them have been closed but not solved.

I can see that memory mapped files might be a niche topic, but I think there is interest.

Scooletz commented 2 months ago

@jeffhandley Would it be possible to get your insight as as an area owner?

Scooletz commented 2 months ago

@msedi Would it be useful to have something for madvise won't need even if it was noop on Windows? DiscardVirtualMemory has a different semantics.