Closed PJB3005 closed 2 years ago
PEReader is documented to be thread safe:
Remarks
The implementation is thread-safe. That is, multiple threads can read data from the reader in parallel. Disposal of the reader is not thread-safe (see Dispose()).
Yes, but the point of this issue is that we seem to have run into a Linux-specific threading bug with it.
Tagging @dotnet/area-system-reflection-metadata since the bot didn't.
I was able to repro in 5.0, but could not repro in 6.0, looks more like Fille IO issue than Reflection to me and most importantly seems already fixed. @PJB3005 could you try out with the latest 6.0 release?
@adamsitnik might have a theory for how FileStream changes could be relevant.
As @PJB3005 mentioned exceptions thrown here are not consistent, below exception might be helpful for @adamsitnik
Unhandled exception. System.AggregateException: One or more errors occurred. (Invalid argument)
---> System.IO.IOException: Invalid argument
at System.IO.FileStream.CheckFileCall(Int64 result, Boolean ignoreNotSupported)
at System.IO.FileStream.FlushReadBuffer()
at System.IO.FileStream.FlushInternalBuffer()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.FileStream.get_SafeFileHandle()
at System.IO.MemoryMappedFiles.MemoryMappedView.CreateView(SafeMemoryMappedFileHandle memMappedFileHandle, MemoryMappedFileAccess access, Int64 requestedOffset, Int64 requestedSize)
at System.IO.MemoryMappedFiles.MemoryMappedFile.CreateViewAccessor(Int64 offset, Int64 size, MemoryMappedFileAccess access)
at System.Reflection.Internal.MemoryMapLightUp.CreateViewAccessor(Object memoryMap, Int64 start, Int32 size)
at System.Reflection.Internal.StreamMemoryBlockProvider.TryCreateMemoryMappedFileBlock(Int64 start, Int32 size, MemoryMappedFileBlock& block)
at System.Reflection.Internal.StreamMemoryBlockProvider.GetMemoryBlockImpl(Int32 start, Int32 size)
at System.Reflection.Internal.MemoryBlockProvider.GetMemoryBlock(Int32 start, Int32 size)
at System.Reflection.PortableExecutable.PEReader.GetMetadataBlock()
at System.Reflection.Metadata.PEReaderExtensions.GetMetadataReader(PEReader peReader, MetadataReaderOptions options, MetadataStringDecoder utf8Decoder)
at Internal.TypeSystem.Ecma.EcmaModule.CreateMetadataReader(TypeSystemContext context, PEReader peReader)
at Internal.TypeSystem.Ecma.EcmaModule.Create(TypeSystemContext context, PEReader peReader, IAssemblyDesc containingAssembly, IModuleResolver customModuleResolver)
at ILVerify.ILVerifyTypeSystemContext.GetModule(PEReader peReader, IAssemblyDesc containingAssembly)
at ILVerify.Verifier.SetSystemModuleName(AssemblyName name)
at test.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 _) in /home/buyaa/test/Program.cs:line 22
at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
--- End of stack trace from previous location ---
at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
at System.Threading.Tasks.TaskReplicator.Replica`1.ExecuteAction(Boolean& yieldedBeforeCompletion)
at System.Threading.Tasks.TaskReplicator.Replica.Execute()
--- End of inner exception stack trace ---
at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction`1 action, ParallelOptions options, Boolean stopOnFirstFailure)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
--- End of stack trace from previous location ---
at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, Action`1 body)
It does appear that .NET 6 fixed this, yeah.
I ran a ~10 minute stress test of it on .NET 6 and didn't get any exception. Seems good?
I ran a ~10 minute stress test of it on .NET 6 and didn't get any exception. Seems good?
Great, i am gonna close the issue then, thanks!
I'm not comfortably relying on "not reproing" as evidence that a race condition has been fixed. Although there were major changes in FileStream, I'm not aware of any changes that intended to fixed thread safety issues - in general, the IO API are not intended to be thread safe. It's possible that the bug simply got moved around due to refactoring. Anyway not all the stacks above have FileStream on them, which strongly suggests that there is protection missing higher up in PEReader, which is after all what claims to be safe.
Ah indeed, in a quick look, this change https://github.com/dotnet/runtime/commit/c2e8973c625dfb890dab3aa86d7d5a6be636cbde#diff-6b7d6a3d9f12a289e4205e436a7a5191378b664f47e6f04f9fc3932d5a8f9fb1R68 likely hid the FlushInternalBuffer() callstack above. Getting the handle was triggering a flush; it now caches the handle in the MemoryMappedView. I don't think there was a correctness issue: it's not intended to be safe for concurrent access.
The other stacks may also be still latent.
I suggest looking at PEReader to understand how it intends to be thread safe and whether there is a hole.
PEReader accepts a Stream, which it wraps (in this case) in a StreamMemoryBlockProvider which it stores as a field _peImage.
StreamMemoryBlockProvider has a field _streamGuard containing an object it uses to lock access to the stream. https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L25-L28
However StreamMemoryBlockProvider.GetStream() will return the stream - along with a StreamConstraints object that contains the _streamGuard lock.
PEReaders.PEHeaders calls PEReader.InitializePEHeaders() which retrieves the stream through this GetStream(), and it locks to ensure the PEHeaders constructor is not run concurrently, and PEHeaders only uses the stream during construction, and none of its fields hold a reference to the stream. https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/PEReader.cs#L306-L314
So far so good.
StreamMemoryBlockProvider creates the MemoryMappedFile over the stream within a lock https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L132
however it then calls MemoryMappedFile.CreateViewAccessor outside of the lock https://github.com/dotnet/runtime/blob/03e90a540cb2dfe5cab4086ea54ad5dd1f655749/src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/MemoryBlocks/StreamMemoryBlockProvider.cs#L163
as seen in the 3rd stack above, that is not guaranteed safe for concurrent calls, nor for concurrent access to the stream, and by refactoring it it just happens to have become less fragile.
I didn't investigate whether subsequent use of MemoryMappedFileBlock is safe, but the above issue is consistent with the above symptoms.
I'm not comfortably relying on "not reproing" as evidence that a race condition has been fixed. Although there were major changes in FileStream, I'm not aware of any changes that intended to fixed thread safety issues - in general, the IO API are not intended to be thread safe. It's possible that the bug simply got moved around due to refactoring. Anyway not all the stacks above have FileStream on them, which strongly suggests that there is protection missing higher up in PEReader, which is after all what claims to be safe.
Good point
Ah indeed, in a quick look, this change c2e8973#diff-6b7d6a3d9f12a289e4205e436a7a5191378b664f47e6f04f9fc3932d5a8f9fb1R68
Thanks for finding this out, indeed that could have fixed the issue.
likely hid the FlushInternalBuffer() callstack above. Getting the handle was triggering a flush; it now caches the handle in the MemoryMappedView. I don't think there was a correctness issue: it's not intended to be safe for concurrent access.
From the doc You can create multiple views of the memory-mapped file, including views of parts of the file. You can map the same part of a file to more than one address to create concurrent memory. For two views to remain concurrent, they have to be created from the same memory-mapped file. Creating two file mappings of the same file with two views does not provide concurrency.
https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile?view=net-5.0#remarks
So calls to MemoryMappedFile.CreateViewAccessor
are expected to be thread-safe, apparently, it was not thread-safe on Linux, and it is possible that the PR you attached fixed that.
I think the above concludes at least the FileStream related exceptions, thank you for looking into them, I will try to find the root cause of other exceptions and if there is any concurrency bug
@buyaa-n thanks for sharing the stack trace!
This is the interesting part:
Unhandled exception. System.AggregateException: One or more errors occurred. (Invalid argument)
---> System.IO.IOException: Invalid argument
at System.IO.FileStream.CheckFileCall(Int64 result, Boolean ignoreNotSupported)
at System.IO.FileStream.FlushReadBuffer()
at System.IO.FileStream.FlushInternalBuffer()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.FileStream.get_SafeFileHandle()
at System.IO.MemoryMappedFiles.MemoryMappedView.CreateView(SafeMemoryMappedFileHandle memMappedFileHandle, MemoryMappedFileAccess access, Int64 requestedOffset, Int64 requestedSize)
It seems that Flush()
method was throwing when FileStream.SafeFileHandle
was accessed. This is still the case for today when buffering is enabled:
And most likely the PR pointed by @danmoseley (https://github.com/dotnet/runtime/pull/53538) has fixed it by just storing the SafeFileHandle
value in a field.
So most likely the FileStream
rewrite has not affected it in any way. But it does not answer the question whether PEReader
is guaranteed to be thread safe.
From the doc You can create multiple views of the memory-mapped file, including views of parts of the file. You can map the same part of a file to more than one address to create concurrent memory. For two views to remain concurrent, they have to be created from the same memory-mapped file. Creating two file mappings of the same file with two views does not provide concurrency. https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile?view=net-5.0#remarks So calls to MemoryMappedFile.CreateViewAccessor are expected to be thread-safe, apparently, it was not thread-safe on Linux, and it is possible that the PR you attached fixed that.
I believe the docs are saying that you can use safely two view accessors concurrently. They do not say that you can create a view accessor simultaneously to using the underlying handle. The recent change simply made that less fragile but it's still not supported.
I believe that the call here to CreateViewAccessor needs to be protected by the lock object protecting the stream.
I believe the docs are saying that you can use safely two view accessors concurrently. They do not say that you can create a view accessor simultaneously to using the underlying handle. The recent change simply made that less fragile but it's still not supported.
I believe that the call here to CreateViewAccessor needs to be protected by the lock object protecting the stream.
Right, i misread, thanks!
The other exceptions log shows that the PEHeaders
position were modified
---> System.BadImageFormatException: Image is either too small or contains an invalid byte offset or count.
at System.Reflection.Throw.ImageTooSmallOrContainsInvalidOffsetOrCount()
at System.Reflection.PortableExecutable.PEHeaders..ctor(Stream peStream, Int32 size, Boolean isLoadedImage)
at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()
---> System.BadImageFormatException: Unknown PE Magic value.
at System.Reflection.PortableExecutable.PEHeader..ctor(PEBinaryReader& reader)
at System.Reflection.PortableExecutable.PEHeaders..ctor(Stream peStream, Int32 size, Boolean isLoadedImage)
at System.Reflection.PortableExecutable.PEReader.InitializePEHeaders()
call trace shows: PEReaders.PEHeaders
calls InitializePEHeaders()
which retrieves the stream under lock using GetStream(), and keeps the stream reference in PEBinaryReader
https://github.com/dotnet/runtime/blob/574b2d46a0759d6456161dc4fc54fae6fde74e36/src/libraries/System.Reflection.Metadata/src/System/Reflection/PortableExecutable/PEHeaders.cs#L88
This reader doesn't use any lock but for this scenario, it is used under the lock from the stream constraint, and in other scenarios PEHeaders
read the the stream only during construction, where concurrency issue should not happen in which explains why PEBinaryReader
not using any lock.
Apparently the other exceptions has the same root cause as above, even PEHeaders
populated under lock the synchronization have lost in the CreateViewAccessor
.
Now i wonder why it was happening only on linux and not happening windows, the same code runs on every platform. Anyway i will test the the fix with 5.0 code base on linux as it is not repoducing in 6.0+
Description
PEReader
appears to not be properly thread safe if passed aFileStream
(it will use memory mapping) and can blow up on Linux if used from multiple threads. We noticed this becauseILVerify
kept blowing up on weird race condition errors only on Linux.Reproduction Steps
Run this code block: it will blow up on Linux after a few seconds but work fine on Windows. Needs
Microsoft.ILVerification
.Expected behavior
PEReader
to be thread safe on LinuxActual behavior
PEReader
is not thread safe on Linux.Some examples of exceptions that occur (it's not consistent):
Regression?
No response
Known Workarounds
Loading from a non-FileStream (e.g. copying to a
MemoryStream
first) works around the problem because it will use a slower path that doesn't use memory mapping.Configuration
Other information
No response