Closed FabianZeulnerGEA closed 2 years ago
Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Author: | FabianZeulnerGEA |
---|---|
Assignees: | - |
Labels: | `area-System.Memory`, `untriaged` |
Milestone: | - |
Hi @FabianZeulnerGEA
I can't repro the issue using following app:
using System;
namespace CopyToRepro
{
internal class Program
{
private byte[] _localDmaData = new byte[1024];
private Span<byte> LocalDmaBuffer => _localDmaData;
static void Main() => new Program().DataReceivedDataCallback(new byte[1024]);
private void DataReceivedDataCallback(ReadOnlySpan<byte> data) => data.CopyTo(LocalDmaBuffer);
}
}
Is there anything specific about the ReadOnlySpan<byte>
passed to DataReceivedDataCallback
? (Is it stack-allocated span or just a managed array?)
Hi so the ReadOnlySpan<> handed to the Callback is a specific memory area defined in the device tree for the DMA to store data. so its in a non cachable area of the ddr4. the data source for the DMA is coming from an FPGA. we are using and Xilinx Ultrascale+ chip running Yocto (if that additional info may help)
i will take your code and put it onto the hardware later to reproduce it.
the same code did work on .netcore 3.1
so the exception is only generated when it trys to copy a span from DMA. so when the Span is in the nonCached area and the location and dma-device is defined through the device tree.
here is the source for the DataCallback:
_dma.DataReceived += DataReceivedDataCallback;
private void Callback(IntPtr bufferPtr, int size, IntPtr data)
{
try
{
unsafe
{
DataReceived?.Invoke(new Span<byte>((byte*)data, size));
}
}
catch (Exception)
{
// we should not let exceptions escape
}
finally
{
ApiAckReceiverBuffer(_dmaPtr, bufferPtr);
}
the exception is generated here: DataReceived?.Invoke(bd.AsSpan());
System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr )
private void DmaComplete(Drivers.IAxiDmaChannel channel)
{
while (_dma.ChannelS2MM.CompletedReceiveBuffer(out Drivers.DmaBufferDescriptor bd))
{
try
{
DataReceived?.Invoke(bd.AsSpan());
}
catch (Exception e)
{
Console.WriteLine($"error: {e}");
}
finally
{
_dma.ChannelS2MM.AcknowledgeReceiveBuffer(bd);
}
}
The device memory on Arm does not always support unaligned memory access. Look for statements like "If a memory location is not capable of supporting unaligned memory accesses, then an unaligned access to that memory location generates an Alignment fault ..." in the "E2.8.2 Device memory" section of Arm Architecture Reference manual.
The implementation of .NET runtime libraries methods like Span
assume normal memory that supports unaligned memory accesses. This assumption is important for best performance and to allow vectorization.
I would recommend that you create your own implementation of Span.Copy
that does not perform unaligned memory access (e.g. copies one byte at a time) and use that when accessing the device memory.
thank you, i will have a look into that. but what did change in the background, that the identical code is happy when build with core 3.1 and crashes when build with .net6 ?
My guess is that the JIT started to generate better code using SIMD instruction that do not support unaligned device memory access.
Are you able to find the exact processor instruction under debugger that is triggering the fault? I expect that you would not find the same instruction in .NET Core 3.1 code for the Memmove method and instead the .NET Core 3.1 code would have a less efficient equivalent.
This issue has been marked needs-author-action
since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.
I did not get the chance to run the debugger on the system. but i hope to get it done this week, and we also move the DMA memory location to a place that SIMD is happy. will keep you posted.
but in general it would be nice if the framework can take care of, and if it detects something like that, switches back to the core3.1 version of it.
it would be nice if the framework can take care of, and if it detects something like that, switches back to the core3.1 version of it.
It is not possible to detect this efficiently. Check like this would significantly slow down memory copying for everybody out there.
This issue has been marked needs-author-action
since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.
It is not possible to detect this efficiently. Check like this would significantly slow down memory copying for everybody out there.
as i am not an expert at the internals of .net. is there a mechanism that takes care that you never run into such a SIMD problem with Span<> when everything is managed by .net. And I only see this problem cause i hard code the Span start address in the device tree. or could it be that you might always run into such an issue using Span within .net6 on arm?
The .NET runtime assumes that all memory that it operates on is a Normal memory. Normal memory is defined by Arm architecture reference manual. The memory does not have to be managed by .NET runtime. For example, memory allocated by C malloc
API is a Normal memory and it will work fine with Span APIs.
You are seeing this since you are operating on a Device memory. Device memory is not a Normal memory and it comes with number of limitations.
Closing as expected behavior per @jkotas's comment above.
@jkotas, got pointed to this from discord and I'm not quite sure this is "expected".
.NET certainly assumes that addresses are aligned and you must use, for example, Unsafe.*Unaligned
APIs to work with "unaligned memory".
Given then that the natural alignment of byte
is 1
, that Span<T>
can be over T*
address, that there are platforms, like Arm32, which may not support "general unaligned access", it is a bug for our Span<T>.CopyTo
implementation to have code such as Unsafe.As<byte, long>(ref dest) = Unsafe.As<byte, long>(ref src);
or Unsafe.As<byte, int>(ref dest) = Unsafe.As<byte, int>(ref src);
This is a bug because it forces the JIT to assume that the ref byte
are 4
or 8
byte aligned, which they may not be. Likewise, code such as Unsafe.As<byte, Block16>(ref dest) = Unsafe.As<byte, Block16>(ref src);
is problematic if it implies to the JIT that the Block16
is anything more than 1-byte aligned.
That is, Span<byte>.CopyTo()
must effectively operate like memcpy
(well memmove
in this case) does in C and be usable for any address. It can still utilize SIMD and other tricks, but only where it meets the platform expectations around memory alignment, otherwise there will be likely bugs when running on platforms that don't support general unaligned memory access.
Given that we have Vector128.IsHardwareAccelerated
, Vector128.LoadUnsafe
and Vector128.StoreUnsafe
which do safely operate with ref byte
to do unaligned SIMD load/stores; this should be something we can reasonably fix, keep performance, and make work with the normal JIT expectations that T
is must be "naturally aligned" (e.g. 1-byte
aligned for byte
).
It is a bug for our Span
.CopyTo implementation to have code such as Unsafe.As<byte, long>(ref dest) = Unsafe.As<byte, long>(ref src); or Unsafe.As<byte, int>(ref dest) = Unsafe.As<byte, int>(ref src);
This code is not 100% portable, but it works just fine on all platforms that we support. It is not a bug since we own the runtime and the list of platforms that we support. If we add a new platform to the list where this is a problem or make other changes in the runtime that make it problematic, we will fix this implementation.
It would be a bug if this code was in 3rd party library that aims to be 100% portable.
Vector128.IsHardwareAccelerated, Vector128.LoadUnsafe and Vector128.StoreUnsafe
Agree that it would be nice to switch this code to use Vector128 now that we have this capability.
Description
HI everybody, i am using .net6. within the application using copyto to copy a span into a different span. it works fine if build and run on windows or ubuntu. but it crashed when deployed on linux-arm64.
generated private span: private byte[] _localDmaData; private Span LocalDmaBuffer => _localDmaData;
copy data into it: line 70: data.CopyTo(LocalDmaBuffer);
exception it get: External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70
heavily appreciate any help or idea for the root-cause and fix.
Fabian
Reproduction Steps
private byte[] _localDmaData = new byte[1024]; private Span LocalDmaBuffer => _localDmaData;
private void DataReceivedDataCallback(ReadOnlySpan data)
{
data.CopyTo(LocalDmaBuffer);
}
Expected behavior
copys data into new span without any complains
Actual behavior
External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response