Closed TYoung86 closed 4 years ago
They will sometimes pass when zeroed memory is allocated by chance. Spec expects them to be zeroed.
Actually, need to handle grow correctly if this.Start is zero (result of zero initial alloc) too, fixes memory_grow up to line 101...
if (this.Start == default)
{
this.Start = Marshal.AllocHGlobal(new IntPtr(newSize));
Unsafe.InitBlock((void*) (this.Start), 0, newSize);
}
else
{
this.Start = Marshal.ReAllocHGlobal(this.Start, new IntPtr(newSize));
Unsafe.InitBlock((void*) (this.Start + checked((int) this.Size)), 0, newSize - this.Size);
}
Thanks for looking into this... I had noticed random memory test failures but had not yet taken the time to research it yet.
Assuming this analysis is correct, if you can make a pull request, your name will be preserved for all time on the GitHub contributors list at https://github.com/RyanLamansky/dotnet-webassembly/graphs/contributors . Otherwise I'll take care of it this weekend.
Yeah I can make a PR.
Is it acceptable to use System.Runtime.CompilerServices.Unsafe?
Alternatively I could use Fody.InlineIL to avoid the dependency, or I can allow unsafe code and use a fixed expression to accomplish the same thing w/ one pointless fixed local GC handle overhead.
Less optimal would be to create a ZeroMemory method using Marshal.WriteByte/Int32/Int64, but more portable/safe, or change all allocations to use pinned allocations on the managed heap.
I was considering forking the project to solely use the managed heap and use tracking references instead of pointers to see what the .NET GC does with it, but not sure on the time/effort.
I'm trying to avoid adding any new dependencies, so if it's in base .NET Standard 2.0, that's fine. Note that UnmanagedMemory
constructor uses Marshal.Copy
from a zeroed array to very quickly zero-init the memory. Not sure why I forgot to do that in the grow method. I'd apply the same approach to both places. My assumption was that Marshal.Copy
would be faster than any other approach, since it's implemented by the run time, but I could be wrong. Haven't benchmarked.
There's cpyblk and initblk CIL instructions for memcpy and memset intrinsics. They're usually only generated for copying and zeroing structures on the stack, but they are the fastest. ( Microsoft's source: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Runtime.CompilerServices.Unsafe/src/System.Runtime.CompilerServices.Unsafe.il#L173 )
Using Fody will add a post-compile assembly processor step to the build to translate Fody.InlineIL instructions into CIL directly and let you use the functionality of System.Runtime.CompilerServices.Unsafe without any additional runtime dependencies.
Fody.InlineIL is definitely a clever tool. For now, though, I want to keep things very simple. Post-1.0, if initblk
is fast enough for the extra steps to access it, it's worth taking another look.
Pick your poison: https://dotnetfiddle.net/N0gH47
Running 12 spin-ups...
Running Test...
Zero by Marshal.Copy (1 -> 2): 7426.797ns
Zero by Marshal.Copy (2 -> 1): 5537.196ns
Zero by CopyBlk (1 -> 2): 6431.044ns
Zero by CopyBlk (2 -> 1): 5927.916ns
Zero by InitBlock (1): 5132.674ns
Zero by InitBlock (2): 4961.165ns
Zero by Unaligned InitBlock (1): 5045.87ns
Zero by Unaligned InitBlock (2): 4954.665ns
NaieveMarshalZeroMemory (1): 145035.854ns
NaieveMarshalZeroMemory (2): 151635.008ns
MarshalZeroMemory (1): 22730.115ns
MarshalZeroMemory (2): 22701.314ns
Zero by HomebrewInitBlock (1): 4946.665ns
Zero by HomebrewInitBlock (2): 4940.464ns
Zero by HomebrewInitBlockUnaligned (1): 4975.166ns
Zero by HomebrewInitBlockUnaligned (2): 4953.465ns
Test complete.
Wow, thanks for putting that together!
Some of the complexity on some of those options isn't really needed (alignment will always be optimal and the page size is always exactly 64 KiB), but overall I think the numbers are accurate.
My favorite among them is HomebrewInitBlock
. It can be simplified a little bit--no need for the conv_u
and initialization value parameter. So, a tweaked version of that sounds perfect 👍
I made some corrections and added versions without conv_u for reference. It's true they don't appear to be necessary.
Running 12 spin-ups...
Running Test...
Zero by Marshal.Copy (1 -> 2): 7075.167ns
Zero by Marshal.Copy (2 -> 1): 5773.1ns
Zero by CopyBlk (1 -> 2): 6386.132ns
Zero by CopyBlk (2 -> 1): 6203.522ns
Zero by InitBlock (1): 4943.056ns
Zero by InitBlock (2): 4918.655ns
Zero by Unaligned InitBlock (1): 4938.056ns
Zero by Unaligned InitBlock (2): 4917.455ns
NaieveMarshalZeroMemory (1): 156381.713ns
NaieveMarshalZeroMemory (2): 156666.128ns
MarshalZeroMemory (1): 27559.43ns
MarshalZeroMemory (2): 27912.248ns
Zero by HomebrewInitBlock (1): 4944.656ns
Zero by HomebrewInitBlock (2): 4925.955ns
Zero by HomebrewInitBlockUnaligned (1): 4926.555ns
Zero by HomebrewInitBlockUnaligned (2): 4939.157ns
Zero by HomebrewInitBlock2 (1): 4905.054ns
Zero by HomebrewInitBlock2 (2): 4851.351ns
Zero by HomebrewInitBlockUnaligned2 (1): 4981.859ns
Zero by HomebrewInitBlockUnaligned2 (2): 4903.354ns
Test complete.
HomebrewInitBlock2 here is without the conv_u, I'll make a PR today.
I'm planning to do another release this weekend, so not long until this fix is available via regular channels.
Released as v0.8.0-preview. By the way, this also fixed all of the memory_size
tests.
https://github.com/RyanLamansky/dotnet-webassembly/blob/61e96b03ba1939d8f8d8c082b7f93e9b07d6f3f7/WebAssembly/Runtime/UnmanagedMemory.cs#L99
You need to zero the new memory, e.g. with Unsafe or something.