kunalspathak commented 8 months ago

Now that all the SVE instructions encoding is completed in https://github.com/dotnet/runtime/issues/94549, it is time to expose these instructions through .NET APIs. Here is the list of categorized APIs with links to the issue where they were approved.

.NET 9 Goal: We aim to complete SVE APIs in .NET 9. SVE2 APIs will be pushed out to .NET 10.

SVE APIs

## High Priority SVE APIs ### [Sve mask](https://github.com/dotnet/runtime/issues/93964) (Complete)

Full list

- [x] AbsoluteCompareGreaterThan https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareGreaterThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareLessThan https://github.com/dotnet/runtime/pull/104464 - [x] AbsoluteCompareLessThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] Compact https://github.com/dotnet/runtime/pull/102992 - [x] CompareEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareGreaterThan https://github.com/dotnet/runtime/pull/104464 - [x] CompareGreaterThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareLessThan https://github.com/dotnet/runtime/pull/104464 - [x] CompareLessThanOrEqual https://github.com/dotnet/runtime/pull/104464 - [x] CompareNotEqualTo https://github.com/dotnet/runtime/pull/104464 - [x] CompareUnordered https://github.com/dotnet/runtime/pull/104464 - [x] ConditionalExtractAfterLastActiveElement https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractAfterLastActiveElementAndReplicate https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractLastActiveElement https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalExtractLastActiveElementAndReplicate https://github.com/dotnet/runtime/pull/104150 - [x] ConditionalSelect https://github.com/dotnet/runtime/pull/100743 - [x] CreateBreakAfterMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [x] CreateBreakAfterPropagateMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [x] CreateBreakBeforeMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** Add optimization for CndSel - [x] CreateBreakBeforePropagateMask https://github.com/dotnet/runtime/pull/104184 **(Future work)** [Add optimization for CndSel](https://github.com/dotnet/runtime/issues/104486) - [x] CreateBreakPropagateMask https://github.com/dotnet/runtime/pull/104704 - [x] CreateFalseMaskByte https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskDouble https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt16 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt32 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskInt64 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskSByte https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskSingle https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt16 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt32 https://github.com/dotnet/runtime/pull/102076 - [x] CreateFalseMaskUInt64 https://github.com/dotnet/runtime/pull/102076 - [x] CreateMaskForFirstActiveElement https://github.com/dotnet/runtime/pull/104002 - [x] CreateMaskForNextActiveElement https://github.com/dotnet/runtime/pull/104002 - [x] CreateTrueMaskByte https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskDouble https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt16 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt32 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskInt64 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskSByte https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskSingle https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt16 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt32 https://github.com/dotnet/runtime/pull/98218 - [x] CreateTrueMaskUInt64 https://github.com/dotnet/runtime/pull/98218 - [x] CreateWhileLessThanMask16Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask32Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask64Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanMask8Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask16Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask32Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask64Bit https://github.com/dotnet/runtime/pull/100949 - [x] CreateWhileLessThanOrEqualMask8Bit https://github.com/dotnet/runtime/pull/100949 - [ ] **(Future item)** ExtractAfterLastScalar https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractAfterLastVector https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractLastScalar https://github.com/dotnet/runtime/pull/103847 - [ ] **(Future item)** ExtractLastVector https://github.com/dotnet/runtime/pull/103847 - [x] ExtractVector https://github.com/dotnet/runtime/pull/103739 - [x] TestAnyTrue https://github.com/dotnet/runtime/pull/103739 - [x] TestFirstTrue https://github.com/dotnet/runtime/pull/103739 - [x] TestLastTrue https://github.com/dotnet/runtime/pull/103739

### [Sve bitwise](https://github.com/dotnet/runtime/issues/93887) (Complete)

Full list

- [x] And https://github.com/dotnet/runtime/pull/101762 - [x] AndAcross https://github.com/dotnet/runtime/pull/101762 - [ ] **(Future work)** AndNot Need to fix https://github.com/dotnet/runtime/issues/101933 - [x] BitwiseClear https://github.com/dotnet/runtime/pull/101853 - [x] BooleanNot https://github.com/dotnet/runtime/pull/101853 - [x] InsertIntoShiftedVector https://github.com/dotnet/runtime/pull/103725 - [x] Not https://github.com/dotnet/runtime/pull/103725 - [x] Or https://github.com/dotnet/runtime/pull/101762 - [x] OrAcross https://github.com/dotnet/runtime/pull/101762 - [ ] **(Future work)** OrNot Need to fix https://github.com/dotnet/runtime/issues/101933 - [x] ShiftLeftLogical https://github.com/dotnet/runtime/pull/104119 - [x] ShiftRightArithmetic https://github.com/dotnet/runtime/pull/104119 - [x] ShiftRightArithmeticForDivide https://github.com/dotnet/runtime/pull/104279 - [x] ShiftRightLogical https://github.com/dotnet/runtime/pull/104119 - [x] Xor https://github.com/dotnet/runtime/pull/101762 - [x] XorAcross https://github.com/dotnet/runtime/pull/101762

### [Sve bitmanipulate](https://github.com/dotnet/runtime/issues/94008) (Complete)

Full list

- [x] DuplicateSelectedScalarToVector https://github.com/dotnet/runtime/pull/103228 - [x] ReverseBits https://github.com/dotnet/runtime/pull/103806 - [x] ReverseElement https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement16 https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement32 https://github.com/dotnet/runtime/pull/102991 - [x] ReverseElement8 https://github.com/dotnet/runtime/pull/102991 - [x] Splice https://github.com/dotnet/runtime/pull/103567 - [x] TransposeEven https://github.com/dotnet/runtime/pull/103068 - [x] TransposeOdd https://github.com/dotnet/runtime/pull/103068 - [x] UnzipEven https://github.com/dotnet/runtime/pull/101294 - [x] UnzipOdd https://github.com/dotnet/runtime/pull/101294 - [x] VectorTableLookup https://github.com/dotnet/runtime/pull/103989 - [x] ZipHigh #101294 - [x] ZipLow #101294

### [Sve loads](https://github.com/dotnet/runtime/issues/94006) (Complete)

Full list

- [x] Compute16BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute32BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute64BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] Compute8BitAddresses https://github.com/dotnet/runtime/pull/103040 - [x] LoadVector https://github.com/dotnet/runtime/pull/98218 - [x] LoadVector128AndReplicateToVector https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorByteNonFaultingZeroExtendToInt16 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt16 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteNonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorByteZeroExtendToInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorByteZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16NonFaultingSignExtendToInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToUInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16NonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt16SignExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt16SignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt32NonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt32NonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorInt32SignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorInt32SignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorNonFaulting https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorNonTemporal https://github.com/dotnet/runtime/pull/103392 - [x] LoadVectorSByteNonFaultingSignExtendToInt16 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt16 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt32 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteNonFaultingSignExtendToUInt64 https://github.com/dotnet/runtime/pull/102903 - [x] LoadVectorSByteSignExtendToInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt16 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorSByteSignExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16NonFaultingZeroExtendToInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16NonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt16ZeroExtendToInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToUInt32 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt16ZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt32NonFaultingZeroExtendToInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt32NonFaultingZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/102860 - [x] LoadVectorUInt32ZeroExtendToInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorUInt32ZeroExtendToUInt64 https://github.com/dotnet/runtime/pull/101291 - [x] LoadVectorx2 https://github.com/dotnet/runtime/pull/102180 - [x] LoadVectorx3 https://github.com/dotnet/runtime/pull/102180 - [x] LoadVectorx4 https://github.com/dotnet/runtime/pull/102180 - [x] PrefetchBytes https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt16 https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt32 https://github.com/dotnet/runtime/pull/103094 - [x] PrefetchInt64 https://github.com/dotnet/runtime/pull/103094

### [Sve stores](https://github.com/dotnet/runtime/issues/94011) (Complete)

Full list

- [x] Store https://github.com/dotnet/runtime/pull/102262 - [x] StoreNarrowing https://github.com/dotnet/runtime/pull/102605 - [x] StoreNonTemporal https://github.com/dotnet/runtime/pull/102769

### [Sve maths](https://github.com/dotnet/runtime/issues/94009) (Complete)

Full list

- [x] Abs https://github.com/dotnet/runtime/pull/100743 - [x] AbsoluteDifference https://github.com/dotnet/runtime/pull/102170 - [x] Add https://github.com/dotnet/runtime/pull/100743 - [x] AddAcross https://github.com/dotnet/runtime/pull/101674 - [x] AddSaturate https://github.com/dotnet/runtime/pull/102170 - [x] Divide https://github.com/dotnet/runtime/pull/101578 - [x] DotProduct https://github.com/dotnet/runtime/pull/102218 - [x] DotProductBySelectedScalar https://github.com/dotnet/runtime/pull/102218 - [x] FusedMultiplyAdd https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplyAddBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplyAddNegated https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtract https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtractBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] FusedMultiplySubtractNegated https://github.com/dotnet/runtime/pull/102007 - [x] Max https://github.com/dotnet/runtime/pull/101859 - [x] MaxAcross https://github.com/dotnet/runtime/pull/101859 - [x] MaxNumber https://github.com/dotnet/runtime/pull/101859 - [x] MaxNumberAcross https://github.com/dotnet/runtime/pull/101859 - [x] Min https://github.com/dotnet/runtime/pull/101859 - [x] MinAcross https://github.com/dotnet/runtime/pull/101859 - [x] MinNumber https://github.com/dotnet/runtime/pull/101859 - [x] MinNumberAcross https://github.com/dotnet/runtime/pull/101859 - [x] Multiply https://github.com/dotnet/runtime/pull/101578 - [x] MultiplyAdd https://github.com/dotnet/runtime/pull/102007 - [x] MultiplyBySelectedScalar https://github.com/dotnet/runtime/pull/102007 - [x] MultiplyExtended https://github.com/dotnet/runtime/pull/102170 - [x] MultiplySubtract https://github.com/dotnet/runtime/pull/102007 - [x] Negate https://github.com/dotnet/runtime/pull/102170 - [x] SignExtend16 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtend32 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtend8 https://github.com/dotnet/runtime/pull/101702 - [x] SignExtendWideningLower https://github.com/dotnet/runtime/pull/101743 - [x] SignExtendWideningUpper https://github.com/dotnet/runtime/pull/101743 - [x] Subtract https://github.com/dotnet/runtime/pull/101578 - [x] SubtractSaturate https://github.com/dotnet/runtime/pull/102170 - [x] ZeroExtend16 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtend32 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtend8 https://github.com/dotnet/runtime/pull/101702 - [x] ZeroExtendWideningLower https://github.com/dotnet/runtime/pull/101743 - [x] ZeroExtendWideningUpper https://github.com/dotnet/runtime/pull/101743

### [Sve counting](https://github.com/dotnet/runtime/issues/94003) (Complete)

Full list

- [x] Count16BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count32BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count64BitElements https://github.com/dotnet/runtime/pull/101188 - [x] Count8BitElements https://github.com/dotnet/runtime/pull/101188 - [x] GetActiveElementCount https://github.com/dotnet/runtime/pull/102813 - [x] LeadingSignCount https://github.com/dotnet/runtime/pull/102548 - [x] LeadingZeroCount https://github.com/dotnet/runtime/pull/102548 - [x] PopCount https://github.com/dotnet/runtime/pull/102548 - [x] SaturatingDecrementBy16BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy32BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy64BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementBy8BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingDecrementByActiveElementCount https://github.com/dotnet/runtime/pull/102994 - [x] SaturatingIncrementBy16BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy32BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy64BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementBy8BitElementCount https://github.com/dotnet/runtime/pull/102315 - [x] SaturatingIncrementByActiveElementCount https://github.com/dotnet/runtime/pull/102994

## Low Priority SVE APIs ### [Sve scatterstores](https://github.com/dotnet/runtime/issues/94014) (Complete)

Full list

- [x] Scatter https://github.com/dotnet/runtime/pull/104555 - [x] Scatter16BitNarrowing https://github.com/dotnet/runtime/pull/104720 - [x] Scatter16BitWithByteOffsetsNarrowing https://github.com/dotnet/runtime/pull/104720 - [x] Scatter32BitNarrowing https://github.com/dotnet/runtime/pull/104720 - [x] Scatter32BitWithByteOffsetsNarrowing https://github.com/dotnet/runtime/pull/104720 - [x] Scatter8BitNarrowing https://github.com/dotnet/runtime/pull/104720 - [x] Scatter8BitWithByteOffsetsNarrowing https://github.com/dotnet/runtime/pull/104720

### [Sve gatherloads](https://github.com/dotnet/runtime/issues/94007) (Complete)

Full list

- [x] GatherPrefetch16Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch32Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch64Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherPrefetch8Bit https://github.com/dotnet/runtime/pull/103826 - [x] GatherVector https://github.com/dotnet/runtime/pull/103159 - [x] GatherVectorByteZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt16SignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt16WithByteOffsetsSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt32SignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorInt32WithByteOffsetsSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorSByteSignExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt16WithByteOffsetsZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt16ZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt32WithByteOffsetsZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorUInt32ZeroExtend https://github.com/dotnet/runtime/pull/103370 - [x] GatherVectorWithByteOffsets https://github.com/dotnet/runtime/pull/103564

### [Sve fp](https://github.com/dotnet/runtime/issues/94005) (Complete)

Full list

- [x] AddRotateComplex https://github.com/dotnet/runtime/pull/104926 - [x] AddSequentialAcross https://github.com/dotnet/runtime/pull/104640 - [x] ConvertToDouble https://github.com/dotnet/runtime/pull/104478 - [x] ConvertToInt32 https://github.com/dotnet/runtime/pull/103098 - [x] ConvertToInt64 https://github.com/dotnet/runtime/pull/104069 - [x] ConvertToSingle https://github.com/dotnet/runtime/pull/104478 - [x] ConvertToUInt32 https://github.com/dotnet/runtime/pull/103098 - [x] ConvertToUInt64 https://github.com/dotnet/runtime/pull/104069 - [x] FloatingPointExponentialAccelerator https://github.com/dotnet/runtime/pull/104649 - [x] MultiplyAddRotateComplex https://github.com/dotnet/runtime/pull/104926 - [x] MultiplyAddRotateComplexBySelectedScalar https://github.com/dotnet/runtime/pull/105002 - [x] ReciprocalEstimate https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalExponent https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalSqrtEstimate https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalSqrtStep https://github.com/dotnet/runtime/pull/103673 - [x] ReciprocalStep https://github.com/dotnet/runtime/pull/103673 - [x] RoundAwayFromZero https://github.com/dotnet/runtime/pull/103588 - [x] RoundToNearest https://github.com/dotnet/runtime/pull/103588 - [x] RoundToNegativeInfinity https://github.com/dotnet/runtime/pull/103588 - [x] RoundToPositiveInfinity https://github.com/dotnet/runtime/pull/103588 - [x] RoundToZero https://github.com/dotnet/runtime/pull/103588 - [x] Scale https://github.com/dotnet/runtime/pull/103663 - [x] Sqrt https://github.com/dotnet/runtime/pull/103663 - [x] TrigonometricMultiplyAddCoefficient https://github.com/dotnet/runtime/pull/104697 - [x] TrigonometricSelectCoefficient https://github.com/dotnet/runtime/pull/104681 - [x] TrigonometricStartingValue https://github.com/dotnet/runtime/pull/104681

### [Sve firstfaulting](https://github.com/dotnet/runtime/issues/94004) (Complete)

Full list

- [x] GatherVectorByteZeroExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorFirstFaulting https://github.com/dotnet/runtime/pull/104502 - [x] GatherVectorInt16SignExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorInt16WithByteOffsetsSignExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorInt32SignExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorInt32WithByteOffsetsSignExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorSByteSignExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorUInt16WithByteOffsetsZeroExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorUInt16ZeroExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorUInt32WithByteOffsetsZeroExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorUInt32ZeroExtendFirstFaulting (Swapnil) https://github.com/dotnet/runtime/pull/105030 - [x] GatherVectorWithByteOffsetFirstFaulting (Aman) https://github.com/dotnet/runtime/pull/106199 - [x] GetFfr https://github.com/dotnet/runtime/pull/104502 - [x] LoadVectorByteZeroExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] LoadVectorFirstFaulting https://github.com/dotnet/runtime/pull/104502 - [x] LoadVectorInt16SignExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] LoadVectorInt32SignExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] LoadVectorSByteSignExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] LoadVectorUInt16ZeroExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] LoadVectorUInt32ZeroExtendFirstFaulting https://github.com/dotnet/runtime/pull/104964 - [x] SetFfr https://github.com/dotnet/runtime/pull/104502

SVE2 APIs

Full list

### [Sve2 scatterstores](https://github.com/dotnet/runtime/issues/94023) - [ ] Scatter16BitNarrowing - [ ] Scatter16BitWithByteOffsetsNarrowing - [ ] Scatter32BitNarrowing - [ ] Scatter32BitWithByteOffsetsNarrowing - [ ] Scatter8BitNarrowing - [ ] Scatter8BitWithByteOffsetsNarrowing - [ ] ScatterNonTemporal ### [Sve2 maths](https://github.com/dotnet/runtime/issues/94022) - [ ] AbsoluteDifferenceAdd - [ ] AbsoluteDifferenceAddWideningLower - [ ] AbsoluteDifferenceAddWideningUpper - [ ] AbsoluteDifferenceWideningLower - [ ] AbsoluteDifferenceWideningUpper - [ ] AddCarryWideningLower - [ ] AddCarryWideningUpper - [ ] AddHighNarowingLower - [ ] AddHighNarowingUpper - [ ] AddPairwise - [ ] AddPairwiseWidening - [ ] AddSaturate - [ ] AddSaturateWithSignedAddend - [ ] AddSaturateWithUnsignedAddend - [ ] AddWideLower - [ ] AddWideUpper - [ ] AddWideningLower - [ ] AddWideningLowerUpper - [ ] AddWideningUpper - [ ] DotProductComplex - [ ] HalvingAdd - [ ] HalvingSubtract - [ ] HalvingSubtractReversed - [ ] MaxNumberPairwise - [ ] MaxPairwise - [ ] MinNumberPairwise - [ ] MinPairwise - [ ] MultiplyAddBySelectedScalar - [ ] MultiplyAddWideningLower - [ ] MultiplyAddWideningUpper - [ ] MultiplyBySelectedScalar - [ ] MultiplySubtractBySelectedScalar - [ ] MultiplySubtractWideningLower - [ ] MultiplySubtractWideningUpper - [ ] MultiplyWideningLower - [ ] MultiplyWideningUpper - [ ] PolynomialMultiply - [ ] PolynomialMultiplyWideningLower - [ ] PolynomialMultiplyWideningUpper - [ ] RoundingAddHighNarowingLower - [ ] RoundingAddHighNarowingUpper - [ ] RoundingHalvingAdd - [ ] RoundingSubtractHighNarowingLower - [ ] RoundingSubtractHighNarowingUpper - [ ] SaturatingAbs - [ ] SaturatingDoublingMultiplyAddWideningLower - [ ] SaturatingDoublingMultiplyAddWideningLowerUpper - [ ] SaturatingDoublingMultiplyAddWideningUpper - [ ] SaturatingDoublingMultiplyHigh - [ ] SaturatingDoublingMultiplySubtractWideningLower - [ ] SaturatingDoublingMultiplySubtractWideningLowerUpper - [ ] SaturatingDoublingMultiplySubtractWideningUpper - [ ] SaturatingDoublingMultiplyWideningLower - [ ] SaturatingDoublingMultiplyWideningUpper - [ ] SaturatingNegate - [ ] SaturatingRoundingDoublingMultiplyAddHigh - [ ] SaturatingRoundingDoublingMultiplyHigh - [ ] SaturatingRoundingDoublingMultiplySubtractHigh - [ ] SubtractHighNarowingLower - [ ] SubtractHighNarowingUpper - [ ] SubtractSaturate - [ ] SubtractSaturateReversed - [ ] SubtractWideLower - [ ] SubtractWideUpper - [ ] SubtractWideningLower - [ ] SubtractWideningLowerUpper - [ ] SubtractWideningUpper - [ ] SubtractWideningUpperLower - [ ] SubtractWithBorrowWideningLower - [ ] SubtractWithBorrowWideningUpper ### [Sve2 mask](https://github.com/dotnet/runtime/issues/94021) - [ ] CreateWhileGreaterThanMask - [ ] CreateWhileGreaterThanOrEqualMask - [ ] CreateWhileReadAfterWriteMask - [ ] CreateWhileWriteAfterReadMask - [ ] Match - [ ] NoMatch - [ ] SaturatingExtractNarrowingLower - [ ] SaturatingExtractNarrowingUpper - [ ] SaturatingExtractUnsignedNarrowingLower - [ ] SaturatingExtractUnsignedNarrowingUpper ### [Sve2 gatherloads](https://github.com/dotnet/runtime/issues/94019) - [ ] GatherVectorByteZeroExtendNonTemporal - [ ] GatherVectorInt16SignExtendNonTemporal - [ ] GatherVectorInt16WithByteOffsetsSignExtendNonTemporal - [ ] GatherVectorInt32SignExtendNonTemporal - [ ] GatherVectorInt32WithByteOffsetsSignExtendNonTemporal - [ ] GatherVectorNonTemporal - [ ] GatherVectorSByteSignExtendNonTemporal - [ ] GatherVectorUInt16WithByteOffsetsZeroExtendNonTemporal - [ ] GatherVectorUInt16ZeroExtendNonTemporal - [ ] GatherVectorUInt32WithByteOffsetsZeroExtendNonTemporal - [ ] GatherVectorUInt32ZeroExtendNonTemporal ### [Sve2 fp](https://github.com/dotnet/runtime/issues/94018) - [ ] AddRotateComplex - [ ] DownConvertNarrowingUpper - [ ] DownConvertRoundingOdd - [ ] DownConvertRoundingOddUpper - [ ] Log2 - [ ] MultiplyAddRotateComplex - [ ] MultiplyAddRotateComplexBySelectedScalar - [ ] ReciprocalEstimate - [ ] ReciprocalSqrtEstimate - [ ] SaturatingComplexAddRotate - [ ] SaturatingRoundingDoublingComplexMultiplyAddHighRotate - [ ] UpConvertWideningUpper ### [Sve2 counting](https://github.com/dotnet/runtime/issues/94017) - [ ] CountMatchingElements - [ ] CountMatchingElementsIn128BitSegments ### [Sve2 bitwise](https://github.com/dotnet/runtime/issues/94015) - [ ] BitwiseClearXor - [ ] BitwiseSelect - [ ] BitwiseSelectLeftInverted - [ ] BitwiseSelectRightInverted - [ ] ShiftArithmeticRounded - [ ] ShiftArithmeticRoundedSaturate - [ ] ShiftArithmeticSaturate - [ ] ShiftLeftAndInsert - [ ] ShiftLeftLogicalSaturate - [ ] ShiftLeftLogicalSaturateUnsigned - [ ] ShiftLeftLogicalWideningEven - [ ] ShiftLeftLogicalWideningOdd - [ ] ShiftLogicalRounded - [ ] ShiftLogicalRoundedSaturate - [ ] ShiftRightAndInsert - [ ] ShiftRightArithmeticAdd - [ ] ShiftRightArithmeticNarrowingSaturateEven - [ ] ShiftRightArithmeticNarrowingSaturateOdd - [ ] ShiftRightArithmeticNarrowingSaturateUnsignedEven - [ ] ShiftRightArithmeticNarrowingSaturateUnsignedOdd - [ ] ShiftRightArithmeticRounded - [ ] ShiftRightArithmeticRoundedAdd - [ ] ShiftRightArithmeticRoundedNarrowingSaturateEven - [ ] ShiftRightArithmeticRoundedNarrowingSaturateOdd - [ ] ShiftRightArithmeticRoundedNarrowingSaturateUnsignedEven - [ ] ShiftRightArithmeticRoundedNarrowingSaturateUnsignedOdd - [ ] ShiftRightLogicalAdd - [ ] ShiftRightLogicalNarrowingEven - [ ] ShiftRightLogicalNarrowingOdd - [ ] ShiftRightLogicalRounded - [ ] ShiftRightLogicalRoundedAdd - [ ] ShiftRightLogicalRoundedNarrowingEven - [ ] ShiftRightLogicalRoundedNarrowingOdd - [ ] ShiftRightLogicalRoundedNarrowingSaturateEven - [ ] ShiftRightLogicalRoundedNarrowingSaturateOdd - [ ] Xor - [ ] XorRotateRight ### [Sve2 bitmanipulate](https://github.com/dotnet/runtime/issues/94020) - [ ] InterleavingXorLowerUpper - [ ] InterleavingXorUpperLower - [ ] MoveWideningLower - [ ] MoveWideningUpper - [ ] VectorTableLookup - [ ] VectorTableLookupExtension ### [SveBf16](https://github.com/dotnet/runtime/issues/94028) - [ ] Bfloat16DotProduct - [ ] Bfloat16MatrixMultiplyAccumulate - [ ] Bfloat16MultiplyAddWideningToSinglePrecisionLower - [ ] Bfloat16MultiplyAddWideningToSinglePrecisionUpper - [ ] ConcatenateEvenInt128FromTwoInputs - [ ] ConcatenateOddInt128FromTwoInputs - [ ] ConditionalExtractAfterLastActiveElement - [ ] ConditionalExtractAfterLastActiveElementAndReplicate - [ ] ConditionalExtractLastActiveElement - [ ] ConditionalExtractLastActiveElementAndReplicate - [ ] ConditionalSelect - [ ] ConvertToBFloat16 - [ ] CreateFalseMaskBFloat16 - [ ] CreateTrueMaskBFloat16 - [ ] CreateWhileReadAfterWriteMask - [ ] CreateWhileWriteAfterReadMask - [ ] DotProductBySelectedScalar - [ ] DownConvertNarrowingUpper - [ ] DuplicateSelectedScalarToVector - [ ] ExtractAfterLastScalar - [ ] ExtractAfterLastVector - [ ] ExtractLastScalar - [ ] ExtractLastVector - [ ] ExtractVector - [ ] GetActiveElementCount - [ ] InsertIntoShiftedVector - [ ] InterleaveEvenInt128FromTwoInputs - [ ] InterleaveInt128FromHighHalvesOfTwoInputs - [ ] InterleaveInt128FromLowHalvesOfTwoInputs - [ ] InterleaveOddInt128FromTwoInputs - [ ] LoadVector - [ ] LoadVector128AndReplicateToVector - [ ] LoadVector256AndReplicateToVector - [ ] LoadVectorFirstFaulting - [ ] LoadVectorNonFaulting - [ ] LoadVectorNonTemporal - [ ] Load2xVector - [ ] Load3xVector - [ ] Load4xVector - [ ] PopCount - [ ] ReverseElement - [ ] Splice - [ ] Store - [ ] StoreNonTemporal - [ ] TransposeEven - [ ] TransposeOdd - [ ] UnzipEven - [ ] UnzipOdd - [ ] VectorTableLookup - [ ] VectorTableLookupExtension - [ ] ZipHigh - [ ] ZipLow ### [SveF32mm](https://github.com/dotnet/runtime/issues/94024) - [ ] MatrixMultiplyAccumulate ### [SveF64mm](https://github.com/dotnet/runtime/issues/94025) - [ ] ConcatenateEvenInt128FromTwoInputs - [ ] ConcatenateOddInt128FromTwoInputs - [ ] InterleaveEvenInt128FromTwoInputs - [ ] InterleaveInt128FromHighHalvesOfTwoInputs - [ ] InterleaveInt128FromLowHalvesOfTwoInputs - [ ] InterleaveOddInt128FromTwoInputs - [ ] LoadVector256AndReplicateToVector - [ ] MatrixMultiplyAccumulate ### [SveFp16](https://github.com/dotnet/runtime/issues/94026) - [ ] Abs - [ ] AbsoluteCompareGreaterThan - [ ] AbsoluteCompareGreaterThanOrEqual - [ ] AbsoluteCompareLessThan - [ ] AbsoluteCompareLessThanOrEqual - [ ] AbsoluteDifference - [ ] Add - [ ] AddAcross - [ ] AddPairwise - [ ] AddRotateComplex - [ ] AddSequentialAcross - [ ] CompareEqual - [ ] CompareGreaterThan - [ ] CompareGreaterThanOrEqual - [ ] CompareLessThan - [ ] CompareLessThanOrEqual - [ ] CompareNotEqualTo - [ ] CompareUnordered - [ ] ConcatenateEvenInt128FromTwoInputs - [ ] ConcatenateOddInt128FromTwoInputs - [ ] ConditionalExtractAfterLastActiveElement - [ ] ConditionalExtractAfterLastActiveElementAndReplicate - [ ] ConditionalExtractLastActiveElement - [ ] ConditionalExtractLastActiveElementAndReplicate - [ ] ConditionalSelect - [ ] ConvertToDouble - [ ] ConvertToHalf - [ ] ConvertToInt16 - [ ] ConvertToInt32 - [ ] ConvertToInt64 - [ ] ConvertToSingle - [ ] ConvertToUInt16 - [ ] ConvertToUInt32 - [ ] ConvertToUInt64 - [ ] CreateFalseMaskHalf - [ ] CreateTrueMaskHalf - [ ] CreateWhileReadAfterWriteMask - [ ] CreateWhileWriteAfterReadMask - [ ] Divide - [ ] DownConvertNarrowingUpper - [ ] DuplicateSelectedScalarToVector - [ ] ExtractAfterLastScalar - [ ] ExtractAfterLastVector - [ ] ExtractLastScalar - [ ] ExtractLastVector - [ ] ExtractVector - [ ] FloatingPointExponentialAccelerator - [ ] FusedMultiplyAdd - [ ] FusedMultiplyAddBySelectedScalar - [ ] FusedMultiplyAddNegated - [ ] FusedMultiplySubtract - [ ] FusedMultiplySubtractBySelectedScalar - [ ] FusedMultiplySubtractNegated - [ ] GetActiveElementCount - [ ] InsertIntoShiftedVector - [ ] InterleaveEvenInt128FromTwoInputs - [ ] InterleaveInt128FromHighHalvesOfTwoInputs - [ ] InterleaveInt128FromLowHalvesOfTwoInputs - [ ] InterleaveOddInt128FromTwoInputs - [ ] LoadVector - [ ] LoadVector128AndReplicateToVector - [ ] LoadVector256AndReplicateToVector - [ ] LoadVectorFirstFaulting - [ ] LoadVectorNonFaulting - [ ] LoadVectorNonTemporal - [ ] LoadVectorx2 - [ ] LoadVectorx3 - [ ] LoadVectorx4 - [ ] Log2 - [ ] Max - [ ] MaxAcross - [ ] MaxNumber - [ ] MaxNumberAcross - [ ] MaxNumberPairwise - [ ] MaxPairwise - [ ] Min - [ ] MinAcross - [ ] MinNumber - [ ] MinNumberAcross - [ ] MinNumberPairwise - [ ] MinPairwise - [ ] Multiply - [ ] MultiplyAddRotateComplex - [ ] MultiplyAddRotateComplexBySelectedScalar - [ ] MultiplyAddWideningLower - [ ] MultiplyAddWideningUpper - [ ] MultiplyBySelectedScalar - [ ] MultiplyExtended - [ ] MultiplySubtractWideningLower - [ ] MultiplySubtractWideningUpper - [ ] Negate - [ ] PopCount - [ ] ReciprocalEstimate - [ ] ReciprocalExponent - [ ] ReciprocalSqrtEstimate - [ ] ReciprocalSqrtStep - [ ] ReciprocalStep - [ ] ReverseElement - [ ] RoundAwayFromZero - [ ] RoundToNearest - [ ] RoundToNegativeInfinity - [ ] RoundToPositiveInfinity - [ ] RoundToZero - [ ] Scale - [ ] Splice - [ ] Sqrt - [ ] Store - [ ] StoreNonTemporal - [ ] Subtract - [ ] TransposeEven - [ ] TransposeOdd - [ ] TrigonometricMultiplyAddCoefficient - [ ] TrigonometricSelectCoefficient - [ ] TrigonometricStartingValue - [ ] UnzipEven - [ ] UnzipOdd - [ ] UpConvertWideningUpper - [ ] VectorTableLookup - [ ] VectorTableLookupExtension - [ ] ZipHigh - [ ] ZipLow ### [SveI8mm](https://github.com/dotnet/runtime/issues/94027) - [ ] DotProductSignedUnsigned - [ ] DotProductUnsignedSigned - [ ] MatrixMultiplyAccumulate - [ ] MatrixMultiplyAccumulateUnsignedSigned ### [Sha3](https://github.com/dotnet/runtime/issues/98692) - [ ] BitwiseClearXor - [ ] BitwiseRotateLeftBy1AndXor - [ ] Xor - [ ] XorRotateRight ### [Sm4](https://github.com/dotnet/runtime/issues/98696) - [ ] Sm4EncryptionAndDecryption - [ ] Sm4KeyUpdates ### [SveAes](https://github.com/dotnet/runtime/issues/94423) - [ ] AesInverseMixColumns - [ ] AesMixColumns - [ ] AesSingleRoundDecryption - [ ] AesSingleRoundEncryption - [ ] PolynomialMultiplyWideningLower - [ ] PolynomialMultiplyWideningUpper ### [SveBitperm](https://github.com/dotnet/runtime/issues/94424) - [ ] GatherLowerBitsFromPositionsSelectedByBitmask - [ ] GroupBitsToRightOrLeftAsSelectedByBitmask - [ ] ScatterLowerBitsIntoPositionsSelectedByBitmask ### [SveSha3](https://github.com/dotnet/runtime/issues/94425) - [ ] BitwiseRotateLeftBy1AndXor ### [SveSm4](https://github.com/dotnet/runtime/issues/94426) - [ ] Sm4EncryptionAndDecryption - [ ] Sm4KeyUpdates

Credits to @a74nh for populating the list and also some files in https://github.com/a74nh/runtime/tree/api_github/sve_api that will help to implement them.

Contributes to https://github.com/dotnet/runtime/issues/93095

a74nh commented 8 months ago

Recommendation for how to implement. Examples of this can be found in 100134

API

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.PlatformNotSupported.cs
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs

Copy/paste contents from files in https://github.com/a74nh/runtime/tree/api_github/sve_api/out_cs_api/ . There should be no need to edit these changes. Keep alphabetical ordering.

The same files have been given additional annotation and can be found in https://github.com/a74nh/runtime/tree/api_github/sve_api/out_helper_api . These are for development use only and are not for commiting.

HW Intrinsics

src/coreclr/jit/hwintrinsiclistarm64sve.h

Copy/paste from https://github.com/a74nh/runtime/blob/api_github/sve_api/out_hwintrinsiclistarm64sve.h For entries with multiple instructions for a single type, this will need fixing via a special code path. The flags and category columns will probably need manually fixing. Flags that are not automatically detected:

HW_Flag_LowMaskedOperation : The predicate in arg1 is 0-7
HW_Flag_HasRMWSemantics : src1 and dest use the same register.
HW_Flag_EmbeddedMaskedOperation : APIs that have just have "predicated" version. These APIs are converted into ConditionalSelect(AllTrue, CALL_API(operands...), Zero) to get the effect of "predicate" registers. E.g. Abs, Divide.
HW_Flag_OptionalEmbeddedMaskedOperation : APIs that have both "predicated" and "unpredicated" version. These APIs can be used stand alone, for which "unpredicated" version of the instruction will be generated. They can also be wrapped in ConditionalSelect in a user code and in which case, "predicated" version of the instruction will be emitted. E.g. Add, Multiply, etc.
HW_Flag_ExplicitMaskedOperation : These APIs take "mask" explicitly as the first argument. E.g. ConditionalSelect
HW_Flag_Scalable : All APIs have this flag to identify that they operate on scalable vector length.
Any other restrictions on the register number.

For any special case where there is no flag, you have options:

Add a new flag. Add code in hwintrinsics to use the flag. There is limited space for new flags, so only do this where there are many instructions that would require it.
If changes need making in codegen, then mark as HW_Flag_SpecialCodeGen and add a new case to CodeGen::genHWIntrinsic().
If changes need making at the import stage then mark as HW_Category_Special and add a new case to Compiler::impSpecialIntrinsic()
Mark as both HW_Flag_SpecialCodeGen and HW_Category_Special

Testing

src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs

Copy/paste from https://raw.githubusercontent.com/a74nh/runtime/api_github/sve_api/out_GenerateHWIntrinsicTests_Arm.cs Rename the template (first column) to a more generic template. We want as few new templates as possible. Existing AdvSimd templates can be copied and then edited to include extra Sve parts. The ValidateIterResult and NextValueOpN entires will need editing to fit the template. Use existing entires as a guide.

Linux

Tests can be build using:

rm -fr ./artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/
./src/tests/build.sh checked -test:JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro.csproj

Tests can then be run:

./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.sh

Generated C# files are in artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/

There are a lot of tests that will be run. To make life easier run the .dll directly and pass it the name of the test (a substring will do). Eg:

$CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
$CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve

Windows

Tests can be build using:

del /F /S /Q repo\artifacts\tests\coreclr\obj\windows.arm64.Release\Managed\JIT\HardwareIntrinsics\Arm\Sve\
pushd repo\src\tests\
build.cmd Release -test JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_r.csproj /p:TargetArchitecture=arm64
build.cmd Release -test JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro.csproj /p:TargetArchitecture=arm64

Tests can then be run:

pushd repo\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_r
HardwareIntrinsics_Arm_r.cmd

Generated C# files are in artifacts\tests\coreclr\obj\windows.arm64.Release\Managed\JIT\HardwareIntrinsics\Arm\Sve\Sve_ro\Sve_ro\gen\

There are a lot of tests that will be run. To make life easier run the .dll directly and pass it the name of the test (a substring will do). Eg:

$CORE_ROOT\corerun .\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
$CORE_ROOT\corerun .\artifacts\tests\coreclr\windows.arm64.Release\JIT\HardwareIntrinsics\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve

Altjit

All the testing works as usual using AltJit* environment variables. Only thing to remember is to set additional environment variable DOTNET_MaxVectorTBitWidth=128 to avoid getting asserts assert(size == info.compCompHnd->getClassSize(typeHnd));

Stress testing

All the tests should be run using all the various stress modes. https://github.com/a74nh/runtime/blob/api_github/sve_api/stress_tester.py is used to run your test in the various modes. Pass it the full command line for running your test. Eg:

stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint

Writing Tests

Once the .cs files have been created, you can edit then manually and rebuild. Copy the changes back to the template once the test works. This can save time fiddling with template params.
Where possible, we want to avoid calling other API calls within a test. This stops dependencies building up in the tests. For bonus points, write your test functions once without additional API calls and once with.
When an API call uses a mask (either input or return) the type of that mask is a vector<T>, this means you can treat it like a normal vector and itterate through it, set values etc. A mask should only contain the values 0 or 1. Within the jit it will be converted to/from a vector of boolean value so that they can be placed in the SVE predicate registers (p0 to p15).

a74nh commented 8 months ago

For choosing APIs.

For now, only pick APIs that do not have an embedded mask (ie: Those where the Arm instruction takes in a Predicate register as arg2, but does not expose the mask at the API level. For example most of the Sve Maths methods).
- Support for embedded masks is ongoing.
- the helper API files indicate which methods have embedded masks with the label "Embedded arg1 mask predicate". Alternatively, see the flag HW_Flag_EmbeddedMaskedOperation in out_hwintrinsiclistarm64sve.h
Sve is highest priority, then Sve2, then all of the smaller extensions.
- I recommend starting with the loads and stores

a74nh commented 8 months ago

As the testing grows it will increasingly become difficult to test just a single API. this is ok during CI, but painful during development and bug fixing.

I recommend someone writes a patch so that a testname can be passed in as an argument so that only that test will run. Eg: HardwareIntrinsics_Arm_ro.sh Sve.Add.uint

kunalspathak commented 8 months ago

I recommend someone writes a patch so that a testname can be passed in as an argument so that only that test will run. Eg: HardwareIntrinsics_Arm_ro.sh Sve.Add.uint

I agree. I have asked @TIHan to come up with a design for this. @TIHan - any update on this?

TIHan commented 8 months ago

I have not looked at this yet, but can this week.

tannergooding commented 8 months ago

Just noting such support should already exist if you invoke the underlying dll directly, this may just be something missing from the .sh file.

The exact argument that matches a filter may be a bit different due to it now using the underlying xunit filtering mechanic, but it should largely just work.

https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/testing.md#running-individual-tests

You can then see some of the logic that gets setup via https://github.com/dotnet/runtime/blob/main/src/tests/Common/XUnitWrapperGenerator/XUnitWrapperGenerator.cs and the corresponding logic of how the test filtering works here: https://github.com/dotnet/runtime/blob/main/src/tests/Common/XUnitWrapperLibrary/TestFilter.cs

The actual filter is constructed like:

System.Collections.Generic.Dictionary<string, string> testExclusionTable = XUnitWrapperLibrary.TestFilter.LoadTestExclusionTable();
XUnitWrapperLibrary.TestFilter filter = new (args, testExclusionTable);

A given TestExecutor then uses it like:

void TestExecutor1(System.IO.StreamWriter tempLogSw, System.IO.StreamWriter statsCsvSw)
{
    if (filter is null || filter.ShouldRunTest(@"JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble", "_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()"))
    {
        System.TimeSpan testStart = stopwatch.Elapsed;
        try
        {
            summary.ReportStartingTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", System.Console.Out);
            outputRecorder.ResetTestOutput();
            _AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble();
            summary.ReportPassedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", stopwatch.Elapsed - testStart, outputRecorder.GetTestOutput(), System.Console.Out, tempLogSw, statsCsvSw);
        }
        catch (System.Exception ex)
        {
            summary.ReportFailedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", stopwatch.Elapsed - testStart, ex, outputRecorder.GetTestOutput(), System.Console.Out, tempLogSw, statsCsvSw);
        }
    }
    else
    {
        string reason = filter.GetTestExclusionReason("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()");
        summary.ReportSkippedTest("_AdvSimd_r::JIT.HardwareIntrinsics.Arm._AdvSimd.Program.AddDouble()", "JIT.HardwareIntrinsics.Arm._AdvSimd.Program", @"AddDouble", System.TimeSpan.Zero, reason, tempLogSw, statsCsvSw);
    }
}

where ShouldRunTest basically just does a stringToSearch.Contains(filter) check at the most basic level

kunalspathak commented 8 months ago

Just noting such support should already exist if you invoke the underlying dll directly, this may just be something missing from the .sh file.

If that's the case, can you or @Tihan can come up with the exact command line that is needed to run a particular case. I don't want engineer to hack around a test to make it working for every API.

a74nh commented 8 months ago

It appears to work:

❯ $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Add_uint
16:34:55.071 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Add_uint()
Supported ISAs:
  AdvSimd:   True
  Aes:       True
  ArmBase:   True
  Crc32:     True
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True

Beginning scenario: RunBasicScenario_UnsafeRead
Beginning scenario: RunBasicScenario_Load
Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario
16:34:55.177 Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Add_uint()

I'm happy with this as a solution then!

a74nh commented 7 months ago

Updated the implementation instructions with stress testing and how to write the tests.

kunalspathak commented 7 months ago

Updated the implementation instructions with stress testing and how to write the tests.

Updated for Windows.

kunalspathak commented 6 months ago

Updated https://github.com/dotnet/runtime/issues/99957#issuecomment-2007408474 with meanings of various HWIntrinsicFlag values used in the table and their meaning.

JulieLeeMSFT commented 3 months ago

Closing this as completed. Will open a new issue for items that will be included in .NET 10.

dotnet / runtime

Arm64: Implement SVE APIs #99957

SVE APIs

SVE2 APIs

API

HW Intrinsics

Testing

Linux

Windows

Altjit

Stress testing

Writing Tests