[X86][F16C] Replace __builtin_ia32_vcvtph2ps* with generic code - Githubissues

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[X86][F16C] Replace __builtin_ia32_vcvtph2ps* with generic code #36527

Closed Quuxplusone closed 4 years ago

Quuxplusone commented 6 years ago


Bugzilla Link	PR37554
Status	RESOLVED FIXED
Importance	P enhancement
Reported by	Simon Pilgrim (llvm-dev@redking.me.uk)
Reported on	2018-05-22 11:40:13 -0700
Last modified on	2020-04-27 19:08:30 -0700
Version	trunk
Hardware	PC Windows NT
CC	craig.topper@gmail.com, llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com, sroland@vmware.com
Fixed by commit(s)	rG12cc105f806f, rG7e9747b50bcb
Attachments
Blocks	PR30624
Blocked by
See also	PR43065

Inspired by https://reviews.llvm.org/D47174, it should be possible to replace
the __builtin_ia32_vcvtph2ps* intrinsics with generic half2float conversions.

This would require a little work in the backend and legalizers as we currently
don't do a good job of lowering such code (see llvm\test\CodeGen\X86\vector-
half-conversions.ll)

InstCombiner::visitCallInst already has some basic constant folding and
demanded elts support.

Quuxplusone commented 4 years ago

https://reviews.llvm.org/D74886 improves the backend codegen

Quuxplusone commented 4 years ago

The backend improvements landed at rG12cc105f806f

The remaining steps are to auto upgrade to fpextend and add handling in CGBuiltin.cpp.

Quuxplusone commented 4 years ago

https://reviews.llvm.org/D75162 removes/upgrades the non-SAE cvtph2ps intrinsics

Quuxplusone commented 4 years ago

(In reply to Simon Pilgrim from comment #3)
> https://reviews.llvm.org/D75162 removes/upgrades the non-SAE cvtph2ps
> intrinsics

rG7e9747b50bcb

Quuxplusone commented 4 years ago

Is it possible to somehow get rounding control for the generic conversion?
We use this intrinsic in mesa llvmpipe, and (at least for d3d10 conformance)
there's a requirement the rounding mode is round to zero, so without rounding
control we have to go all the way back to manual implementation which is rather
lame. (I complained in the past about intrinsic removal without any good
replacement, I guess it's possible to live with it but it will just be
unnecessarily slow for no good reason.)

Quuxplusone commented 4 years ago

This half to float which doesn't need any rounding right?

The intrinsics that were removed only had a single argument.

def int_x86_vcvtph2ps_128 : GCCBuiltin<"__builtin_ia32_vcvtph2ps">,
     Intrinsic<[llvm_v4f32_ty], [llvm_v8i16_ty], [IntrNoMem]>;

Quuxplusone commented 4 years ago

Oh sorry! I thought that both directions were removed. Half to float indeed of course doesn't need rounding control. Sorry for the noise...