dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.21k stars 4.72k forks source link

span.copyto(span) crashes at linux-arm64 platform in .net6 #66082

Closed FabianZeulnerGEA closed 2 years ago

FabianZeulnerGEA commented 2 years ago

Description

HI everybody, i am using .net6. within the application using copyto to copy a span into a different span. it works fine if build and run on windows or ubuntu. but it crashed when deployed on linux-arm64.

generated private span: private byte[] _localDmaData; private Span LocalDmaBuffer => _localDmaData;

copy data into it: line 70: data.CopyTo(LocalDmaBuffer);

exception it get: External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70

heavily appreciate any help or idea for the root-cause and fix.

Fabian

Reproduction Steps

private byte[] _localDmaData = new byte[1024]; private Span LocalDmaBuffer => _localDmaData;

private void DataReceivedDataCallback(ReadOnlySpan data) { data.CopyTo(LocalDmaBuffer); }

Expected behavior

copys data into new span without any complains

Actual behavior

External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
### Description HI everybody, i am using .net6. within the application using copyto to copy a span into a different span. it works fine if build and run on windows or ubuntu. but it crashed when deployed on linux-arm64. generated private span: private byte[] _localDmaData; private Span LocalDmaBuffer => _localDmaData; copy data into it: line 70: data.CopyTo(LocalDmaBuffer); exception it get: External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70 heavily appreciate any help or idea for the root-cause and fix. Fabian ### Reproduction Steps private byte[] _localDmaData = new byte[1024]; private Span LocalDmaBuffer => _localDmaData; private void DataReceivedDataCallback(ReadOnlySpan data) { data.CopyTo(LocalDmaBuffer); } ### Expected behavior copys data into new span without any complains ### Actual behavior External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr ) at ....DataReceivedDataCallback(ReadOnlySpan`1 data) in ../code.cs:line 70 ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration _No response_ ### Other information _No response_
Author: FabianZeulnerGEA
Assignees: -
Labels: `area-System.Memory`, `untriaged`
Milestone: -
adamsitnik commented 2 years ago

Hi @FabianZeulnerGEA

I can't repro the issue using following app:

using System;

namespace CopyToRepro
{
    internal class Program
    {
        private byte[] _localDmaData = new byte[1024];
        private Span<byte> LocalDmaBuffer => _localDmaData;

        static void Main() => new Program().DataReceivedDataCallback(new byte[1024]);

        private void DataReceivedDataCallback(ReadOnlySpan<byte> data) => data.CopyTo(LocalDmaBuffer);
    }
}

Is there anything specific about the ReadOnlySpan<byte> passed to DataReceivedDataCallback? (Is it stack-allocated span or just a managed array?)

FabianZeulnerGEA commented 2 years ago

Hi so the ReadOnlySpan<> handed to the Callback is a specific memory area defined in the device tree for the DMA to store data. so its in a non cachable area of the ddr4. the data source for the DMA is coming from an FPGA. we are using and Xilinx Ultrascale+ chip running Yocto (if that additional info may help)

i will take your code and put it onto the hardware later to reproduce it.

the same code did work on .netcore 3.1

FabianZeulnerGEA commented 2 years ago

so the exception is only generated when it trys to copy a span from DMA. so when the Span is in the nonCached area and the location and dma-device is defined through the device tree.

here is the source for the DataCallback:

_dma.DataReceived += DataReceivedDataCallback;

private void Callback(IntPtr bufferPtr, int size, IntPtr data)
        {
            try
            {
                unsafe
                {
                    DataReceived?.Invoke(new Span<byte>((byte*)data, size));
                }
            }
            catch (Exception)
            {
                // we should not let exceptions escape
            }
            finally
            {
                ApiAckReceiverBuffer(_dmaPtr, bufferPtr);
            }

the exception is generated here: DataReceived?.Invoke(bd.AsSpan());

System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at System.Buffer.Memmove(Byte& , Byte& , UIntPtr )


private void DmaComplete(Drivers.IAxiDmaChannel channel)
        {
            while (_dma.ChannelS2MM.CompletedReceiveBuffer(out Drivers.DmaBufferDescriptor bd))
            {
                try
                {
                    DataReceived?.Invoke(bd.AsSpan());
                }
                catch (Exception e)
                {
                    Console.WriteLine($"error: {e}");
                }
                finally
                {
                    _dma.ChannelS2MM.AcknowledgeReceiveBuffer(bd);
                }
            }
jkotas commented 2 years ago

The device memory on Arm does not always support unaligned memory access. Look for statements like "If a memory location is not capable of supporting unaligned memory accesses, then an unaligned access to that memory location generates an Alignment fault ..." in the "E2.8.2 Device memory" section of Arm Architecture Reference manual.

The implementation of .NET runtime libraries methods like Span assume normal memory that supports unaligned memory accesses. This assumption is important for best performance and to allow vectorization.

I would recommend that you create your own implementation of Span.Copy that does not perform unaligned memory access (e.g. copies one byte at a time) and use that when accessing the device memory.

FabianZeulnerGEA commented 2 years ago

thank you, i will have a look into that. but what did change in the background, that the identical code is happy when build with core 3.1 and crashes when build with .net6 ?

jkotas commented 2 years ago

My guess is that the JIT started to generate better code using SIMD instruction that do not support unaligned device memory access.

Are you able to find the exact processor instruction under debugger that is triggering the fault? I expect that you would not find the same instruction in .NET Core 3.1 code for the Memmove method and instead the .NET Core 3.1 code would have a less efficient equivalent.

ghost commented 2 years ago

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

FabianZeulnerGEA commented 2 years ago

I did not get the chance to run the debugger on the system. but i hope to get it done this week, and we also move the DMA memory location to a place that SIMD is happy. will keep you posted.

but in general it would be nice if the framework can take care of, and if it detects something like that, switches back to the core3.1 version of it.

jkotas commented 2 years ago

it would be nice if the framework can take care of, and if it detects something like that, switches back to the core3.1 version of it.

It is not possible to detect this efficiently. Check like this would significantly slow down memory copying for everybody out there.

ghost commented 2 years ago

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

FabianZeulnerGEA commented 2 years ago

It is not possible to detect this efficiently. Check like this would significantly slow down memory copying for everybody out there.

as i am not an expert at the internals of .net. is there a mechanism that takes care that you never run into such a SIMD problem with Span<> when everything is managed by .net. And I only see this problem cause i hard code the Span start address in the device tree. or could it be that you might always run into such an issue using Span within .net6 on arm?

jkotas commented 2 years ago

The .NET runtime assumes that all memory that it operates on is a Normal memory. Normal memory is defined by Arm architecture reference manual. The memory does not have to be managed by .NET runtime. For example, memory allocated by C malloc API is a Normal memory and it will work fine with Span APIs.

You are seeing this since you are operating on a Device memory. Device memory is not a Normal memory and it comes with number of limitations.

jeffhandley commented 2 years ago

Closing as expected behavior per @jkotas's comment above.

tannergooding commented 2 years ago

@jkotas, got pointed to this from discord and I'm not quite sure this is "expected".

.NET certainly assumes that addresses are aligned and you must use, for example, Unsafe.*Unaligned APIs to work with "unaligned memory".

Given then that the natural alignment of byte is 1, that Span<T> can be over T* address, that there are platforms, like Arm32, which may not support "general unaligned access", it is a bug for our Span<T>.CopyTo implementation to have code such as Unsafe.As<byte, long>(ref dest) = Unsafe.As<byte, long>(ref src); or Unsafe.As<byte, int>(ref dest) = Unsafe.As<byte, int>(ref src);

This is a bug because it forces the JIT to assume that the ref byte are 4 or 8 byte aligned, which they may not be. Likewise, code such as Unsafe.As<byte, Block16>(ref dest) = Unsafe.As<byte, Block16>(ref src); is problematic if it implies to the JIT that the Block16 is anything more than 1-byte aligned.

That is, Span<byte>.CopyTo() must effectively operate like memcpy (well memmove in this case) does in C and be usable for any address. It can still utilize SIMD and other tricks, but only where it meets the platform expectations around memory alignment, otherwise there will be likely bugs when running on platforms that don't support general unaligned memory access.

tannergooding commented 2 years ago

Given that we have Vector128.IsHardwareAccelerated, Vector128.LoadUnsafe and Vector128.StoreUnsafe which do safely operate with ref byte to do unaligned SIMD load/stores; this should be something we can reasonably fix, keep performance, and make work with the normal JIT expectations that T is must be "naturally aligned" (e.g. 1-byte aligned for byte).

jkotas commented 2 years ago

It is a bug for our Span.CopyTo implementation to have code such as Unsafe.As<byte, long>(ref dest) = Unsafe.As<byte, long>(ref src); or Unsafe.As<byte, int>(ref dest) = Unsafe.As<byte, int>(ref src);

This code is not 100% portable, but it works just fine on all platforms that we support. It is not a bug since we own the runtime and the list of platforms that we support. If we add a new platform to the list where this is a problem or make other changes in the runtime that make it problematic, we will fix this implementation.

It would be a bug if this code was in 3rd party library that aims to be 100% portable.

jkotas commented 2 years ago

Vector128.IsHardwareAccelerated, Vector128.LoadUnsafe and Vector128.StoreUnsafe

Agree that it would be nice to switch this code to use Vector128 now that we have this capability.