dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Concurrency bug with PEReader when loading from FileStream on Linux #60545

Closed PJB3005 closed 2 years ago

PJB3005 commented 3 years ago

Description

PEReader appears to not be properly thread safe if passed a FileStream (it will use memory mapping) and can blow up on Linux if used from multiple threads. We noticed this because ILVerify kept blowing up on weird race condition errors only on Linux.

Reproduction Steps

Run this code block: it will blow up on Linux after a few seconds but work fine on Windows. Needs Microsoft.ILVerification.

using System;
using System.Collections.Concurrent;
using System.IO;
using System.Reflection;
using System.Reflection.PortableExecutable;
using System.Threading.Tasks;
using ILVerify;

namespace pereader
{
    class Program
    {
        static void Main(string[] args)
        {
            while (true)
            {
                var resolver = new Resolver();
                Parallel.For(1, 32, _ => {
                    var verifier = new Verifier(resolver);
                    verifier.SetSystemModuleName(new AssemblyName("System.Runtime"));
                });
            }
        }
    }

    class Resolver : IResolver
    {
        private readonly ConcurrentDictionary<string, PEReader> _dictionary = new();

        public static readonly string BasePath = Path.GetDirectoryName(typeof(Console).Assembly.Location); 

        private PEReader ResolveCore(string simpleName)
        {
            return new PEReader(File.OpenRead(Path.Combine(BasePath, $"{simpleName}.dll")));
        }

        public PEReader Resolve(string simpleName)
        {
            return _dictionary.GetOrAdd(simpleName, ResolveCore);
        }
    }
}

Expected behavior

PEReader to be thread safe on Linux

Actual behavior

PEReader is not thread safe on Linux.

Some examples of exceptions that occur (it's not consistent):

Unhandled exception. System.AggregateException: One or more errors occurred. (Unknown PE Magic value.)                                               ---> System.BadImageFormatException: Unknown PE Magic value.
at System.Reflection.PortableExecutable.PEHeader..ctor(PEBinaryReader& reader)
at System.Reflection.PortableExecutable.PEHeaders..ctor(Stream peStream, Int32 size, Boolean isLoadedImage)
at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()
at Internal.TypeSystem.Ecma.EcmaModule.CreateMetadataReader(TypeSystemContext context, PEReader peReader)
at Internal.TypeSystem.Ecma.EcmaModule.Create(TypeSystemContext context, PEReader peReader, IAssemblyDesc containingAssembly, IModuleResolver customModuleResolver)
at ILVerify.ILVerifyTypeSystemContext.GetModule(PEReader peReader, IAssemblyDesc containingAssembly)
at ILVerify.Verifier.SetSystemModuleName(AssemblyName name)
at pereader.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 _) in /home/pj/pereader/Program.cs:line 20                                          
 ---> System.ObjectDisposedException: Cannot access a closed file.                                                                                     at System.IO.FileStream.Seek(Int64 offset, SeekOrigin origin)at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()
at System.Reflection.PortableExecutable.PEReader.get_HasMetadata()
at Internal.TypeSystem.Ecma.EcmaModule.CreateMetadataReader(TypeSystemContext context, PEReader peReader)
at Internal.TypeSystem.Ecma.EcmaModule.Create(TypeSystemContext context, PEReader peReader, IAssemblyDesc containingAssembly, IModuleResolver customModuleResolver)
at ILVerify.ILVerifyTypeSystemContext.GetModule(PEReader peReader, IAssemblyDesc containingAssembly)
at ILVerify.Verifier.SetSystemModuleName(AssemblyName name)
at pereader.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 _) in /home/pj/pereader/Program.cs:line 20

Regression?

No response

Known Workarounds

Loading from a non-FileStream (e.g. copying to a MemoryStream first) works around the problem because it will use a slower path that doesn't use memory mapping.

Configuration

Other information

No response

danmoseley commented 3 years ago

PEReader is documented to be thread safe:

Remarks
The implementation is thread-safe. That is, multiple threads can read data from the reader in parallel. Disposal of the reader is not thread-safe (see Dispose()).
PJB3005 commented 3 years ago

Yes, but the point of this issue is that we seem to have run into a Linux-specific threading bug with it.

jeffhandley commented 3 years ago

Tagging @dotnet/area-system-reflection-metadata since the bot didn't.

buyaa-n commented 2 years ago

I was able to repro in 5.0, but could not repro in 6.0, looks more like Fille IO issue than Reflection to me and most importantly seems already fixed. @PJB3005 could you try out with the latest 6.0 release?

danmoseley commented 2 years ago

@adamsitnik might have a theory for how FileStream changes could be relevant.

buyaa-n commented 2 years ago

As @PJB3005 mentioned exceptions thrown here are not consistent, below exception might be helpful for @adamsitnik

Unhandled exception. System.AggregateException: One or more errors occurred. (Invalid argument)
 ---> System.IO.IOException: Invalid argument
   at System.IO.FileStream.CheckFileCall(Int64 result, Boolean ignoreNotSupported)
   at System.IO.FileStream.FlushReadBuffer()
   at System.IO.FileStream.FlushInternalBuffer()
   at System.IO.FileStream.Flush(Boolean flushToDisk)
   at System.IO.FileStream.Flush()
   at System.IO.FileStream.get_SafeFileHandle()
   at System.IO.MemoryMappedFiles.MemoryMappedView.CreateView(SafeMemoryMappedFileHandle memMappedFileHandle, MemoryMappedFileAccess access, Int64 requestedOffset, Int64 requestedSize)
   at System.IO.MemoryMappedFiles.MemoryMappedFile.CreateViewAccessor(Int64 offset, Int64 size, MemoryMappedFileAccess access)
   at System.Reflection.Internal.MemoryMapLightUp.CreateViewAccessor(Object memoryMap, Int64 start, Int32 size)
   at System.Reflection.Internal.StreamMemoryBlockProvider.TryCreateMemoryMappedFileBlock(Int64 start, Int32 size, MemoryMappedFileBlock& block)
   at System.Reflection.Internal.StreamMemoryBlockProvider.GetMemoryBlockImpl(Int32 start, Int32 size)
   at System.Reflection.Internal.MemoryBlockProvider.GetMemoryBlock(Int32 start, Int32 size)
   at System.Reflection.PortableExecutable.PEReader.GetMetadataBlock()
   at System.Reflection.Metadata.PEReaderExtensions.GetMetadataReader(PEReader peReader, MetadataReaderOptions options, MetadataStringDecoder utf8Decoder)
   at Internal.TypeSystem.Ecma.EcmaModule.CreateMetadataReader(TypeSystemContext context, PEReader peReader)
   at Internal.TypeSystem.Ecma.EcmaModule.Create(TypeSystemContext context, PEReader peReader, IAssemblyDesc containingAssembly, IModuleResolver customModuleResolver)
   at ILVerify.ILVerifyTypeSystemContext.GetModule(PEReader peReader, IAssemblyDesc containingAssembly)
   at ILVerify.Verifier.SetSystemModuleName(AssemblyName name)
   at test.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 _) in /home/buyaa/test/Program.cs:line 22
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica`1.ExecuteAction(Boolean& yieldedBeforeCompletion)
   at System.Threading.Tasks.TaskReplicator.Replica.Execute()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction`1 action, ParallelOptions options, Boolean stopOnFirstFailure)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, Action`1 body)
PJB3005 commented 2 years ago

It does appear that .NET 6 fixed this, yeah.

PJB3005 commented 2 years ago

I ran a ~10 minute stress test of it on .NET 6 and didn't get any exception. Seems good?

buyaa-n commented 2 years ago

I ran a ~10 minute stress test of it on .NET 6 and didn't get any exception. Seems good?

Great, i am gonna close the issue then, thanks!

danmoseley commented 2 years ago

I'm not comfortably relying on "not reproing" as evidence that a race condition has been fixed. Although there were major changes in FileStream, I'm not aware of any changes that intended to fixed thread safety issues - in general, the IO API are not intended to be thread safe. It's possible that the bug simply got moved around due to refactoring. Anyway not all the stacks above have FileStream on them, which strongly suggests that there is protection missing higher up in PEReader, which is after all what claims to be safe.

Ah indeed, in a quick look, this change https://github.com/dotnet/runtime/commit/c2e8973c625dfb890dab3aa86d7d5a6be636cbde#diff-6b7d6a3d9f12a289e4205e436a7a5191378b664f47e6f04f9fc3932d5a8f9fb1R68 likely hid the FlushInternalBuffer() callstack above. Getting the handle was triggering a flush; it now caches the handle in the MemoryMappedView. I don't think there was a correctness issue: it's not intended to be safe for concurrent access.

The other stacks may also be still latent.

I suggest looking at PEReader to understand how it intends to be thread safe and whether there is a hole.

danmoseley commented 2 years ago

PEReader accepts a Stream, which it wraps (in this case) in a StreamMemoryBlockProvider which it stores as a field _peImage.

StreamMemoryBlockProvider has a field _streamGuard containing an object it uses to lock access to the stream. https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L25-L28

However StreamMemoryBlockProvider.GetStream() will return the stream - along with a StreamConstraints object that contains the _streamGuard lock.

PEReaders.PEHeaders calls PEReader.InitializePEHeaders() which retrieves the stream through this GetStream(), and it locks to ensure the PEHeaders constructor is not run concurrently, and PEHeaders only uses the stream during construction, and none of its fields hold a reference to the stream. https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/PEReader.cs#L306-L314

So far so good.

StreamMemoryBlockProvider creates the MemoryMappedFile over the stream within a lock https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L132

however it then calls MemoryMappedFile.CreateViewAccessor outside of the lock https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L163

as seen in the 3rd stack above, that is not guaranteed safe for concurrent calls, nor for concurrent access to the stream, and by refactoring it it just happens to have become less fragile.

I didn't investigate whether subsequent use of MemoryMappedFileBlock is safe, but the above issue is consistent with the above symptoms.

buyaa-n commented 2 years ago

I'm not comfortably relying on "not reproing" as evidence that a race condition has been fixed. Although there were major changes in FileStream, I'm not aware of any changes that intended to fixed thread safety issues - in general, the IO API are not intended to be thread safe. It's possible that the bug simply got moved around due to refactoring. Anyway not all the stacks above have FileStream on them, which strongly suggests that there is protection missing higher up in PEReader, which is after all what claims to be safe.

Good point

Ah indeed, in a quick look, this change c2e8973#diff-6b7d6a3d9f12a289e4205e436a7a5191378b664f47e6f04f9fc3932d5a8f9fb1R68

Thanks for finding this out, indeed that could have fixed the issue.

likely hid the FlushInternalBuffer() callstack above. Getting the handle was triggering a flush; it now caches the handle in the MemoryMappedView. I don't think there was a correctness issue: it's not intended to be safe for concurrent access.

From the doc You can create multiple views of the memory-mapped file, including views of parts of the file. You can map the same part of a file to more than one address to create concurrent memory. For two views to remain concurrent, they have to be created from the same memory-mapped file. Creating two file mappings of the same file with two views does not provide concurrency. https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile?view=net-5.0#remarks

So calls to MemoryMappedFile.CreateViewAccessor are expected to be thread-safe, apparently, it was not thread-safe on Linux, and it is possible that the PR you attached fixed that.

I think the above concludes at least the FileStream related exceptions, thank you for looking into them, I will try to find the root cause of other exceptions and if there is any concurrency bug

adamsitnik commented 2 years ago

@buyaa-n thanks for sharing the stack trace!

This is the interesting part:

Unhandled exception. System.AggregateException: One or more errors occurred. (Invalid argument)
 ---> System.IO.IOException: Invalid argument
   at System.IO.FileStream.CheckFileCall(Int64 result, Boolean ignoreNotSupported)
   at System.IO.FileStream.FlushReadBuffer()
   at System.IO.FileStream.FlushInternalBuffer()
   at System.IO.FileStream.Flush(Boolean flushToDisk)
   at System.IO.FileStream.Flush()
   at System.IO.FileStream.get_SafeFileHandle()
   at System.IO.MemoryMappedFiles.MemoryMappedView.CreateView(SafeMemoryMappedFileHandle memMappedFileHandle, MemoryMappedFileAccess access, Int64 requestedOffset, Int64 requestedSize)

It seems that Flush() method was throwing when FileStream.SafeFileHandle was accessed. This is still the case for today when buffering is enabled:

https://github.com/dotnet/runtime/blob/0666ebc475871c27f5b9d4ee8e91922f20be46e9/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs#L101-L108

And most likely the PR pointed by @danmoseley (https://github.com/dotnet/runtime/pull/53538) has fixed it by just storing the SafeFileHandle value in a field.

So most likely the FileStream rewrite has not affected it in any way. But it does not answer the question whether PEReader is guaranteed to be thread safe.

danmoseley commented 2 years ago

From the doc You can create multiple views of the memory-mapped file, including views of parts of the file. You can map the same part of a file to more than one address to create concurrent memory. For two views to remain concurrent, they have to be created from the same memory-mapped file. Creating two file mappings of the same file with two views does not provide concurrency. https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile?view=net-5.0#remarks So calls to MemoryMappedFile.CreateViewAccessor are expected to be thread-safe, apparently, it was not thread-safe on Linux, and it is possible that the PR you attached fixed that.

I believe the docs are saying that you can use safely two view accessors concurrently. They do not say that you can create a view accessor simultaneously to using the underlying handle. The recent change simply made that less fragile but it's still not supported.

I believe that the call here to CreateViewAccessor needs to be protected by the lock object protecting the stream.

buyaa-n commented 2 years ago

I believe the docs are saying that you can use safely two view accessors concurrently. They do not say that you can create a view accessor simultaneously to using the underlying handle. The recent change simply made that less fragile but it's still not supported.

I believe that the call here to CreateViewAccessor needs to be protected by the lock object protecting the stream.

Right, i misread, thanks!

The other exceptions log shows that the PEHeaders position were modified

 ---> System.BadImageFormatException: Image is either too small or contains an invalid byte offset or count.
   at System.Reflection.Throw.ImageTooSmallOrContainsInvalidOffsetOrCount()
   at System.Reflection.PortableExecutable.PEHeaders..ctor(Stream peStream, Int32 size, Boolean isLoadedImage)
   at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()

 ---> System.BadImageFormatException: Unknown PE Magic value.
   at System.Reflection.PortableExecutable.PEHeader..ctor(PEBinaryReader& reader)
   at System.Reflection.PortableExecutable.PEHeaders..ctor(Stream peStream, Int32 size, Boolean isLoadedImage)
   at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()

call trace shows: PEReaders.PEHeaders calls InitializePEHeaders() which retrieves the stream under lock using GetStream(), and keeps the stream reference in PEBinaryReader https://github.com/dotnet/runtime/blob/574b2d46a0759d6456161dc4fc54fae6fde74e36/src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/PEHeaders.cs#L88 This reader doesn't use any lock but for this scenario, it is used under the lock from the stream constraint, and in other scenarios PEHeaders read the the stream only during construction, where concurrency issue should not happen in which explains why PEBinaryReader not using any lock.

Apparently the other exceptions has the same root cause as above, even PEHeaders populated under lock the synchronization have lost in the CreateViewAccessor.

Now i wonder why it was happening only on linux and not happening windows, the same code runs on every platform. Anyway i will test the the fix with 5.0 code base on linux as it is not repoducing in 6.0+

buyaa-n commented 2 years ago

Anyway i will test the the fix with 5.0 code base on linux as it is not repoducing in 6.0+

Tested the fix with 5.0 code on Linux, the failure did not happen at least for 30 mins, so raised a PR, PTAL