[BUG] neon_intrinsics: some fp16 convert intrinsics have type mismatches

The intrinsics for scalar converts between fp16 and integer/fixed-point values are currently specified to always use 16bit -> 16bit converts, regardless of the size of the integer/fixed-point value. For example:

https://github.com/ARM-software/acle/blob/6eb85169a112395c8bce978fba19154efcd725ea/tools/intrinsic_db/advsimd.csv#L3795

Using the instruction listed causes certain input values to be treated incorrectly. For the above example, an input int32 65504 produces an fp16 value of -32.0, instead of the expected 65504.0.

Instead, the above intrinsic should use the SCVTF Hd,Wn instruction, which better matches the input type.

This applies to all scalar converts between fp16 and 32-bit/64-bit integer/fixed-point converts:

float16_t vcvth_f16_s32
float16_t vcvth_f16_s64
float16_t vcvth_f16_u32
float16_t vcvth_f16_u64
int32_t vcvth_s32_f16
int64_t vcvth_s64_f16
uint32_t vcvth_u32_f16
uint64_t vcvth_u64_f16
float16_t vcvth_n_f16_s32
float16_t vcvth_n_f16_s64
float16_t vcvth_n_f16_u32
float16_t vcvth_n_f16_u64
int32_t vcvth_n_s32_f16
int64_t vcvth_n_s64_f16
uint32_t vcvth_n_u32_f16
uint64_t vcvth_n_u64_f16

Testing with two mainstream compilers (gcc and clang/llvm) shows that these intrinsics are often already generating the proposed instructions, rather than the instructions listed in the ACLE. In particular: GCC (tested with 9.2.0) generates the proposed instructions for all of the intrinsics. clang/llvm (tested with 14.0.0) generates the proposed instructions for the integer converts, but generates the ACLE instructions for the fixed-point converts.

ARM-software / acle

[BUG] neon_intrinsics: some fp16 convert intrinsics have type mismatches #212