dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.58k stars 4.55k forks source link

Arm64: Implement SVE APIs #99957

Open kunalspathak opened 3 months ago

kunalspathak commented 3 months ago

Now that all the SVE instructions encoding is completed in https://github.com/dotnet/runtime/issues/94549, it is time to expose these instructions through .NET APIs. Here is the list of categorized APIs with links to the issue where they were approved.

.NET 9 Goal: We aim to complete SVE APIs in .NET 9. SVE2 APIs will be pushed out to .NET 10.

SVE APIs

## High Priority SVE APIs ### [Sve mask](https://github.com/dotnet/runtime/issues/93964) - [x] AbsoluteCompareGreaterThan https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareGreaterThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareLessThan https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareLessThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] Compact https://github.com/dotnet/runtime/pull/102992 - [x] CompareEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareGreaterThan https://github.com/dotnet/runtime/pull/104464 - [x] CompareGreaterThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareLessThan https://github.com/dotnet/runtime/pull/104464 - [x] CompareLessThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareNotEqualTo https://github.com/dotnet/runtime/pull/104464 - [x] CompareUnordered https://github.com/dotnet/runtime/pull/104464 - [x] ConditionalExtractAfterLastActiveElement https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractAfterLastActiveElementAndReplicate https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractLastActiveElement https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractLastActiveElementAndReplicate https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalSelect https://github.com/dotnet/runtime/pull/100743 - [x] CreateBreakAfterMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [x] CreateBreakAfterPropagateMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [x] CreateBreakBeforeMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** Add optimization for CndSel - [x] CreateBreakBeforePropagateMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [ ] CreateBreakPropagateMask (Will) - [x] CreateFalseMaskByte https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskDouble https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt16 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt32 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt64 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskSByte https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskSingle https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt16 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt32 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt64 https://github.com/dotnet/runtime/pull/102076 - [x] CreateMaskForFirstActiveElement https://github.com/dotnet/runtime/pull/104002 - [x] CreateMaskForNextActiveElement https://github.com/dotnet/runtime/pull/104002 - [x] CreateTrueMaskByte https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskDouble https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt16 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt32 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt64 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskSByte https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskSingle https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt16 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt32 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt64 https://github.com/dotnet/runtime/pull/98218 - [x] CreateWhileLessThanMask16Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask32Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask64Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask8Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask16Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask32Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask64Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask8Bit https://github.com/dotnet/runtime/pull/100949 - [ ] **(Future item)** ExtractAfterLastScalar https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractAfterLastVector https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractLastScalar https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractLastVector https://github.com/dotnet/runtime/pull/103847 - [x] ExtractVector https://github.com/dotnet/runtime/pull/103739 - [x] TestAnyTrue https://github.com/dotnet/runtime/pull/103739 - [x] TestFirstTrue https://github.com/dotnet/runtime/pull/103739 - [x] TestLastTrue https://github.com/dotnet/runtime/pull/103739 ### [Sve bitwise](https://github.com/dotnet/runtime/issues/93887) - [x] And https://github.com/dotnet/runtime/pull/101762 - [x] AndAcross https://github.com/dotnet/runtime/pull/101762 - [ ] **(Future work)** AndNot Need to fix https://github.com/dotnet/runtime/issues/101933 - [x] BitwiseClear https://github.com/dotnet/runtime/pull/101853 - [x] BooleanNot https://github.com/dotnet/runtime/pull/101853 - [x] InsertIntoShiftedVector https://github.com/dotnet/runtime/pull/103725 - [x] Not https://github.com/dotnet/runtime/pull/103725 - [x] Or https://github.com/dotnet/runtime/pull/101762 - [x] OrAcross https://github.com/dotnet/runtime/pull/101762 - [ ] **(Future work)** OrNot Need to fix https://github.com/dotnet/runtime/issues/101933 - [x] ShiftLeftLogical https://github.com/dotnet/runtime/pull/104119 - [x] ShiftRightArithmetic https://github.com/dotnet/runtime/pull/104119 - [x] ShiftRightArithmeticForDivide https://github.com/dotnet/runtime/pull/104279 - [x] ShiftRightLogical https://github.com/dotnet/runtime/pull/104119 - [x] Xor https://github.com/dotnet/runtime/pull/101762 - [x] XorAcross https://github.com/dotnet/runtime/pull/101762 ### [Sve bitmanipulate](https://github.com/dotnet/runtime/issues/94008) (Complete)
Full list - [x] DuplicateSelectedScalarToVector https://github.com/dotnet/runtime/pull/103228 - [x] ReverseBits https://github.com/dotnet/runtime/pull/103806 - [x] ReverseElement https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement16 https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement32 https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement8 https://github.com/dotnet/runtime/pull/102991 - [x] Splice https://github.com/dotnet/runtime/pull/103567 - [x] TransposeEven https://github.com/dotnet/runtime/pull/103068 - [x] TransposeOdd https://github.com/dotnet/runtime/pull/103068 - [x] UnzipEven https://github.com/dotnet/runtime/pull/101294 - [x] UnzipOdd https://github.com/dotnet/runtime/pull/101294 - [x] VectorTableLookup https://github.com/dotnet/runtime/pull/103989 - [x] ZipHigh #101294 - [x] ZipLow #101294
### [Sve loads](https://github.com/dotnet/runtime/issues/94006) (Complete)
Full list - [x] Compute16BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute32BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute64BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute8BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] LoadVector https://github.com/dotnet/runtime/pull/98218 - [x] LoadVector128AndReplicateToVector https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorByteNonFaultingZeroExtendToInt16 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt16 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteZeroExtendToInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16NonFaultingSignExtendToInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToUInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16SignExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt32NonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt32NonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt32SignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt32SignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorNonFaulting https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorNonTemporal https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorSByteNonFaultingSignExtendToInt16 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt16 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteSignExtendToInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16NonFaultingZeroExtendToInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16ZeroExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt32NonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt32NonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt32ZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt32ZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorx2 https://github.com/dotnet/runtime/pull/102180 - [x] LoadVectorx3 https://github.com/dotnet/runtime/pull/102180 - [x] LoadVectorx4 https://github.com/dotnet/runtime/pull/102180 - [x] PrefetchBytes https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt16 https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt32 https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt64 https://github.com/dotnet/runtime/pull/103094
### [Sve stores](https://github.com/dotnet/runtime/issues/94011) (Complete)
Full list - [x] Store https://github.com/dotnet/runtime/pull/102262 - [x] StoreNarrowing https://github.com/dotnet/runtime/pull/102605 - [x] StoreNonTemporal https://github.com/dotnet/runtime/pull/102769
### [Sve maths](https://github.com/dotnet/runtime/issues/94009) (Complete)
Full list - [x] Abs https://github.com/dotnet/runtime/pull/100743 - [x] AbsoluteDifference https://github.com/dotnet/runtime/pull/102170 - [x] Add https://github.com/dotnet/runtime/pull/100743 - [x] AddAcross https://github.com/dotnet/runtime/pull/101674 - [x] AddSaturate https://github.com/dotnet/runtime/pull/102170 - [x] Divide https://github.com/dotnet/runtime/pull/101578 - [x] DotProduct https://github.com/dotnet/runtime/pull/102218 - [x] DotProductBySelectedScalar https://github.com/dotnet/runtime/pull/102218 - [x] FusedMultiplyAdd https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplyAddBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplyAddNegated https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtract https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtractBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtractNegated https://github.com/dotnet/runtime/pull/102007 - [x] Max https://github.com/dotnet/runtime/pull/101859 - [x] MaxAcross https://github.com/dotnet/runtime/pull/101859 - [x] MaxNumber https://github.com/dotnet/runtime/pull/101859 - [x] MaxNumberAcross https://github.com/dotnet/runtime/pull/101859 - [x] Min https://github.com/dotnet/runtime/pull/101859 - [x] MinAcross https://github.com/dotnet/runtime/pull/101859 - [x] MinNumber https://github.com/dotnet/runtime/pull/101859 - [x] MinNumberAcross https://github.com/dotnet/runtime/pull/101859 - [x] Multiply https://github.com/dotnet/runtime/pull/101578 - [x] MultiplyAdd https://github.com/dotnet/runtime/pull/102007 - [x] MultiplyBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] MultiplyExtended https://github.com/dotnet/runtime/pull/102170 - [x] MultiplySubtract https://github.com/dotnet/runtime/pull/102007 - [x] Negate https://github.com/dotnet/runtime/pull/102170 - [x] SignExtend16 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtend32 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtend8 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtendWideningLower https://github.com/dotnet/runtime/pull/101743 - [x] SignExtendWideningUpper https://github.com/dotnet/runtime/pull/101743 - [x] Subtract https://github.com/dotnet/runtime/pull/101578 - [x] SubtractSaturate https://github.com/dotnet/runtime/pull/102170 - [x] ZeroExtend16 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtend32 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtend8 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtendWideningLower https://github.com/dotnet/runtime/pull/101743 - [x] ZeroExtendWideningUpper https://github.com/dotnet/runtime/pull/101743
### [Sve counting](https://github.com/dotnet/runtime/issues/94003) (Complete)
Full list - [x] Count16BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count32BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count64BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count8BitElements https://github.com/dotnet/runtime/pull/101188 - [x] GetActiveElementCount https://github.com/dotnet/runtime/pull/102813 - [x] LeadingSignCount https://github.com/dotnet/runtime/pull/102548 - [x] LeadingZeroCount https://github.com/dotnet/runtime/pull/102548 - [x] PopCount https://github.com/dotnet/runtime/pull/102548 - [x] SaturatingDecrementBy16BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy32BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy64BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy8BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementByActiveElementCount https://github.com/dotnet/runtime/pull/102994 - [x] SaturatingIncrementBy16BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy32BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy64BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy8BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementByActiveElementCount https://github.com/dotnet/runtime/pull/102994
## Low Priority SVE APIs ### [Sve scatterstores](https://github.com/dotnet/runtime/issues/94014) - [x] Scatter https://github.com/dotnet/runtime/pull/104555 - [ ] Scatter16BitNarrowing - [ ] Scatter16BitWithByteOffsetsNarrowing - [ ] Scatter32BitNarrowing - [ ] Scatter32BitWithByteOffsetsNarrowing - [ ] Scatter8BitNarrowing - [ ] Scatter8BitWithByteOffsetsNarrowing ### [Sve gatherloads](https://github.com/dotnet/runtime/issues/94007) (Complete)
Full list - [x] GatherPrefetch16Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch32Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch64Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch8Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherVector https://github.com/dotnet/runtime/pull/103159 - [x] GatherVectorByteZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt16SignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt16WithByteOffsetsSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt32SignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt32WithByteOffsetsSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorSByteSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt16WithByteOffsetsZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt16ZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt32WithByteOffsetsZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt32ZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorWithByteOffsets https://github.com/dotnet/runtime/pull/103564
### [Sve fp](https://github.com/dotnet/runtime/issues/94005) - [ ] AddRotateComplex https://github.com/dotnet/runtime/pull/104258 - [ ] AddSequentialAcross https://github.com/dotnet/runtime/pull/104640 - [ ] ConvertToDouble https://github.com/dotnet/runtime/pull/104478 - [x] ConvertToInt32 https://github.com/dotnet/runtime/pull/103098 - [x] ConvertToInt64 https://github.com/dotnet/runtime/pull/104069 - [ ] ConvertToSingle https://github.com/dotnet/runtime/pull/104478 - [x] ConvertToUInt32 https://github.com/dotnet/runtime/pull/103098 - [x] ConvertToUInt64 https://github.com/dotnet/runtime/pull/104069 - [ ] FloatingPointExponentialAccelerator https://github.com/dotnet/runtime/pull/104649 - [ ] MultiplyAddRotateComplex - [ ] MultiplyAddRotateComplexBySelectedScalar - [x] ReciprocalEstimate https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalExponent https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalSqrtEstimate https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalSqrtStep https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalStep https://github.com/dotnet/runtime/pull/103673 - [x] RoundAwayFromZero https://github.com/dotnet/runtime/pull/103588 - [x] RoundToNearest https://github.com/dotnet/runtime/pull/103588 - [x] RoundToNegativeInfinity https://github.com/dotnet/runtime/pull/103588 - [x] RoundToPositiveInfinity https://github.com/dotnet/runtime/pull/103588 - [x] RoundToZero https://github.com/dotnet/runtime/pull/103588 - [x] Scale https://github.com/dotnet/runtime/pull/103663 - [x] Sqrt https://github.com/dotnet/runtime/pull/103663 - [ ] TrigonometricMultiplyAddCoefficient https://github.com/dotnet/runtime/pull/104697 - [ ] TrigonometricSelectCoefficient https://github.com/dotnet/runtime/pull/104681 - [ ] TrigonometricStartingValue https://github.com/dotnet/runtime/pull/104681 ### [Sve firstfaulting](https://github.com/dotnet/runtime/issues/94004) - [ ] GatherVectorByteZeroExtendFirstFaulting - [ ] GatherVectorFirstFaulting https://github.com/dotnet/runtime/pull/104502 - [ ] GatherVectorInt16SignExtendFirstFaulting - [ ] GatherVectorInt16WithByteOffsetsSignExtendFirstFaulting - [ ] GatherVectorInt32SignExtendFirstFaulting - [ ] GatherVectorInt32WithByteOffsetsSignExtendFirstFaulting - [ ] GatherVectorSByteSignExtendFirstFaulting - [ ] GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting - [ ] GatherVectorUInt16ZeroExtendFirstFaulting - [ ] GatherVectorUInt32WithByteOffsetsZeroExtendFirstFaulting - [ ] GatherVectorUInt32ZeroExtendFirstFaulting - [ ] GatherVectorWithByteOffsetFirstFaulting - [ ] GetFfr https://github.com/dotnet/runtime/pull/104502 - [ ] LoadVectorByteZeroExtendFirstFaulting - [ ] LoadVectorFirstFaulting https://github.com/dotnet/runtime/pull/104502 - [ ] LoadVectorInt16SignExtendFirstFaulting - [ ] LoadVectorInt32SignExtendFirstFaulting - [ ] LoadVectorSByteSignExtendFirstFaulting - [ ] LoadVectorUInt16ZeroExtendFirstFaulting - [ ] LoadVectorUInt32ZeroExtendFirstFaulting - [ ] SetFfr https://github.com/dotnet/runtime/pull/104502

SVE2 APIs

Sve2 scatterstores

Sve2 maths

Sve2 mask

Sve2 gatherloads

Sve2 fp

Sve2 counting

Sve2 bitwise

Sve2 bitmanipulate

SveBf16

SveF32mm

SveF64mm

SveFp16

SveI8mm

Sha3

Sm4

SveAes

SveBitperm

SveSha3

SveSm4

Credits to @a74nh for populating the list and also some files in https://github.com/a74nh/runtime/tree/api_github/sve_api that will help to implement them.

Contributes to https://github.com/dotnet/runtime/issues/93095

a74nh commented 3 months ago

Recommendation for how to implement. Examples of this can be found in 100134

API

Copy/paste contents from files in https://github.com/a74nh/runtime/tree/api_github/sve_api/out_cs_api/ . There should be no need to edit these changes. Keep alphabetical ordering.

The same files have been given additional annotation and can be found in https://github.com/a74nh/runtime/tree/api_github/sve_api/out_helper_api . These are for development use only and are not for commiting.

HW Intrinsics

Copy/paste from https://github.com/a74nh/runtime/blob/api_github/sve_api/out_hwintrinsiclistarm64sve.h For entries with multiple instructions for a single type, this will need fixing via a special code path. The flags and category columns will probably need manually fixing. Flags that are not automatically detected:

For any special case where there is no flag, you have options:

  1. Add a new flag. Add code in hwintrinsics to use the flag. There is limited space for new flags, so only do this where there are many instructions that would require it.
  2. If changes need making in codegen, then mark as HW_Flag_SpecialCodeGen and add a new case to CodeGen::genHWIntrinsic().
  3. If changes need making at the import stage then mark as HW_Category_Special and add a new case to Compiler::impSpecialIntrinsic()
  4. Mark as both HW_Flag_SpecialCodeGen and HW_Category_Special

Testing

Copy/paste from https://raw.githubusercontent.com/a74nh/runtime/api_github/sve_api/out_GenerateHWIntrinsicTests_Arm.cs Rename the template (first column) to a more generic template. We want as few new templates as possible. Existing AdvSimd templates can be copied and then edited to include extra Sve parts. The ValidateIterResult and NextValueOpN entires will need editing to fit the template. Use existing entires as a guide.

Linux

Tests can be build using:

rm -fr ./artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/
./src/tests/build.sh checked -test:JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro.csproj

Tests can then be run:

./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.sh

Generated C# files are in artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/

There are a lot of tests that will be run. To make life easier run the .dll directly and pass it the name of the test (a substring will do). Eg:

$CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
$CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve

Windows

Tests can be build using:

del /F /S /Q repo\artifacts\tests\coreclr\obj\windows.arm64.Release\Managed\JIT\HardwareIntrinsics\Arm\Sve\
pushd repo\src\tests\
build.cmd Release -test JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_r.csproj /p:TargetArchitecture=arm64
build.cmd Release -test JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro.csproj /p:TargetArchitecture=arm64

Tests can then be run:

pushd repo\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_r
HardwareIntrinsics_Arm_r.cmd

Generated C# files are in artifacts\tests\coreclr\obj\windows.arm64.Release\Managed\JIT\HardwareIntrinsics\Arm\Sve\Sve_ro\Sve_ro\gen\

There are a lot of tests that will be run. To make life easier run the .dll directly and pass it the name of the test (a substring will do). Eg:

$CORE_ROOT\corerun .\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
$CORE_ROOT\corerun .\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve

Altjit

All the testing works as usual using AltJit* environment variables. Only thing to remember is to set additional environment variable DOTNET_MaxVectorTBitWidth=128 to avoid getting asserts assert(size == info.compCompHnd->getClassSize(typeHnd));

Stress testing

All the tests should be run using all the various stress modes. https://github.com/a74nh/runtime/blob/api_github/sve_api/stress_tester.py is used to run your test in the various modes. Pass it the full command line for running your test. Eg:

stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint

Writing Tests

a74nh commented 3 months ago

For choosing APIs.

a74nh commented 3 months ago

As the testing grows it will increasingly become difficult to test just a single API. this is ok during CI, but painful during development and bug fixing.

I recommend someone writes a patch so that a testname can be passed in as an argument so that only that test will run. Eg: HardwareIntrinsics_Arm_ro.sh Sve.Add.uint

kunalspathak commented 3 months ago

I recommend someone writes a patch so that a testname can be passed in as an argument so that only that test will run. Eg: HardwareIntrinsics_Arm_ro.sh Sve.Add.uint

I agree. I have asked @TIHan to come up with a design for this. @TIHan - any update on this?

TIHan commented 3 months ago

I have not looked at this yet, but can this week.

tannergooding commented 3 months ago

Just noting such support should already exist if you invoke the underlying dll directly, this may just be something missing from the .sh file.

The exact argument that matches a filter may be a bit different due to it now using the underlying xunit filtering mechanic, but it should largely just work.


https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/testing.md#running-individual-tests

You can then see some of the logic that gets setup via https://github.com/dotnet/runtime/blob/main/src/tests/Common/XUnitWrapperGenerator/XUnitWrapperGenerator.cs and the corresponding logic of how the test filtering works here: https://github.com/dotnet/runtime/blob/main/src/tests/Common/XUnitWrapperLibrary/TestFilter.cs

The actual filter is constructed like:

System.Collections.Generic.Dictionary<string, string> testExclusionTable = XUnitWrapperLibrary.TestFilter.LoadTestExclusionTable();
XUnitWrapperLibrary.TestFilter filter = new (args, testExclusionTable);

A given TestExecutor then uses it like:

void TestExecutor1(System.IO.StreamWriter tempLogSw, System.IO.StreamWriter statsCsvSw)
{
    if (filter is null || filter.ShouldRunTest(@"JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble", "_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()"))
    {
        System.TimeSpan testStart = stopwatch.Elapsed;
        try
        {
            summary.ReportStartingTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", System.Console.Out);
            outputRecorder.ResetTestOutput();
            _AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble();
            summary.ReportPassedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", stopwatch.Elapsed - testStart, outputRecorder.GetTestOutput(), System.Console.Out, tempLogSw, statsCsvSw);
        }
        catch (System.Exception ex)
        {
            summary.ReportFailedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", stopwatch.Elapsed - testStart, ex, outputRecorder.GetTestOutput(), System.Console.Out, tempLogSw, statsCsvSw);
        }
    }
    else
    {
        string reason = filter.GetTestExclusionReason("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()");
        summary.ReportSkippedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", System.TimeSpan.Zero, reason, tempLogSw, statsCsvSw);
    }
}

where ShouldRunTest basically just does a stringToSearch.Contains(filter) check at the most basic level

kunalspathak commented 3 months ago

Just noting such support should already exist if you invoke the underlying dll directly, this may just be something missing from the .sh file.

If that's the case, can you or @Tihan can come up with the exact command line that is needed to run a particular case. I don't want engineer to hack around a test to make it working for every API.

a74nh commented 3 months ago

It appears to work:

❯ $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
16:34:55.071 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Add_uint()
Supported ISAs:
  AdvSimd:   True
  Aes:       True
  ArmBase:   True
  Crc32:     True
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunBasicScenario_Load
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario
16:34:55.177 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Add_uint()

I'm happy with this as a solution then!

a74nh commented 2 months ago

Updated the implementation instructions with stress testing and how to write the tests.

kunalspathak commented 2 months ago

Updated the implementation instructions with stress testing and how to write the tests.

Updated for Windows.

kunalspathak commented 2 months ago

Updated https://github.com/dotnet/runtime/issues/99957#issuecomment-2007408474 with meanings of various HWIntrinsicFlag values used in the table and their meaning.