Closed rwgardner closed 11 years ago
LLVM 3.1 added support for half floats, so this should be doable. Marking as 'up for grabs'.
Since this is strictly a storage type, very few operations are needed – mostly conversion to and from larger float types.
@rwgardner, my guess is that this will happen sooner if you submit a pull request. ("Up for grabs" is a good choice here, and it basically means "waiting for someone to do it." Since you want the feature...) It's good that you first submitted it as an issue, however, in case there were strong objections; since that doesn't seem to be the case, it looks like the way is clear for you to add this feature.
Some time in the not-too-distant past, support for Int128
was added. Perhaps a good start might be browsing the commit history (with git log
and git show
) to find out exactly how that was done---it might be a great model for this case.
Float16 should be substantially easier than Int128. Up for grabs is more like "waiting for someone to do it and pretty nicely isolated and doable by a determined newcomer."
The cool thing about Int128
was that it was done fully in julia. I believe that to get a fast Float16 implementation, one may need to leverage LLVM's Float16 capabilities in intrinsics.cpp and codegen.cpp.
I believe a first cut implementation can be done by leveraging bitshifts and such the way @rwgardner has already done, and it would be nice to receive that as a pull request as a starting point.
Sounds good. I'm not "grabbing" this yet, but I will if I really want it done. (Unfortunately, I don't get paid to work on Julia for the most part, which means I need to do this in my free time. That's something I'd love to do, but in short, a new first baby due any day has been and will be dominating that for a while.)
Is it possible for you to isolate the code that you have already written for Float16
and submit that?
Outline of what needs to be done:
@JeffBezanson, any thoughts on whether it's better to add new specific intrinsics (fptrunc16
and fpext32
) or generalize the existing ones? I was leaning towards generalizing the existing ones and renaming fptrunc32
=> fptrunc
and fpext64
=> fpext
.
If rwgardner is alright with it, I can try implementing Float16
. I've wanted to find a way to get my hands dirty in Julia.
@mattgallivan Please jump in. More the merrier. @StefanKarpinski 's outline is basically what needs to be done, and one can follow the Float32
implementation in src
and base
.
Just to expand on what I mean by "generalizing the existing ones", this means turning the fptrunc
and fpext
intrinsics into versions that aren't specific to bit sizes but use type info to figure out the appropriate sizes and call the corresponding LLVM instructions. We've gradually been moving from specific versions with bit sizes in their names to more generic ones.
The Int stuff already does that and it would be nice to do so with FloatingPoint too. I wonder if we should take this opportunity to also add Float128 at the same time, assuming LLVM supports it.
Since there is no hardware support for quad-precision arithmetic, adding Float128, is quite a bit more complicated.
Yeah, that's a whole different can of worms. You actually want to compute with Float128 or it's completely useless. For Float16, it's fine to just be able to store them.
@mattgallivan all sounds good. I would love to contribute and would have a lot of fun doing it, but my life is about as insane as it's ever been right now. Hopefully I can contribute in other ways in the future.
You may not want this (I'm sure it could be written more efficiently, etc., and you may want to do it in fortran or C), but here's what I have. It also hasn't been heavily validated yet, but you might use it for validation by comparing it to your code. I haven't done any conversion back to Float16.
bitstype 16 MyFloat16
function convert(::Type{Float32}, val::MyFloat16)
val = uint32(reinterpret(Uint16, val))
sign = (val & 0x8000) >> 15
exp = (val & 0x7c00) >> 10
sig = (val & 0x3ff) >> 0
ret::Uint32
if exp == 0
if sig == 0
sign = sign << 31
ret = sign | exp | sig
else
n_bit = 1
bit = 0x0200
while (bit & sig) == 0
n_bit = n_bit + 1
bit = bit >> 1
end
sign = sign << 31
exp = (-14 - n_bit + 127) << 23
sig = ((sig & (~bit)) << n_bit) << (23 - 10)
ret = sign | exp | sig
end
elseif exp == 0x1f
if sig == 0
if sign == 0
ret = 0x7f800000
else
ret = 0xff800000
end
else
ret = 0xffffffff
end
else
sign = sign << 31
exp = (exp - 15 + 127) << 23
sig = sig << (23 - 10)
ret = sign | exp | sig
end
return reinterpret(Float32, ret)
end
function convert(::Type{Float64}, val::MyFloat16)
val = uint64(reinterpret(Uint16, val))
sign = (val & 0x8000) >> 15
exp = (val & 0x7c00) >> 10
sig = (val & 0x3ff) >> 0
ret::Uint64
if exp == 0
if sig == 0
sign = sign << 63
ret = sign | exp | sig
else
n_bit = 1
bit = 0x0200
while (bit & sig) == 0
n_bit = n_bit + 1
bit = bit >> 1
end
sign = sign << 63
exp = (-14 - n_bit + 1023) << 52
sig = ((sig & (~bit)) << n_bit) << (52 - 10)
ret = sign | exp | sig
end
elseif exp == 0x1f
if sig == 0
if sign == 0
ret = 0x7ff0000000000000
else
ret = 0xfff0000000000000
end
else
ret = 0xffffffffffffffff
end
else
sign = sign << 63
exp = (exp - 15 + 1023) << 52
sig = sig << (52 - 10)
ret = sign | exp | sig
end
return reinterpret(Float64, ret)
end
We could convert to only Float32 or Float64 and then use existing code to convert between those. It seems more efficient to convert to/from both directly in most cases, but it may not be on some architectures, partly depending on whether there is hardware support for converting between Float32 and Float64. (I don't know if that's something floating point units typically support or not.)
@StefanKarpinski Would it be good to start off with this as a pure julia implementation and get it in base to begin with?
Until the LLVM bug is sorted out, it may be worthwhile to put @rwgardner 's julia implementation in Base. That way, at least the storage format can be used, and the conversions could be potentially faster when the LLVM issue is fixed.
@loladiro Does LLVM 3.3 fix the Float16 bugs?
Even using @rwgardner's conversions, the following patch unfortunately still causes LLVM failures:
https://gist.github.com/StefanKarpinski/9092d04bc24c44493d08
julia> float16(1.5)
LLVM ERROR: Cannot select: 0x104151b10: ch = store 0x102070910, 0x10421df10, 0x104231d10, 0x10434d410<ST2[%14]> [ORD=77165] [ID=35]
0x10421df10: f16,ch = load 0x10434dc10, 0x102070010, 0x10434d410<LD2[FixedStack0]> [ORD=77156] [ID=27]
0x102070010: i64 = FrameIndex<0> [ORD=77155] [ID=4]
0x10434d410: i64 = undef [ORD=77150] [ID=2]
0x104231d10: i64 = add 0x104233910, 0x1041a7810 [ORD=77163] [ID=33]
0x104233910: i64,ch,glue = CopyFromReg 0x104087a10, 0x104088010, 0x104087a10:1 [ORD=77157] [ID=32]
0x104088010: i64 = Register %RAX [ORD=77157] [ID=10]
0x104087a10: ch,glue = callseq_end 0x10434da10, 0x104264310, 0x104264310, 0x10434da10:1 [ORD=77157] [ID=31]
0x104264310: i64 = TargetConstant<0> [ORD=77155] [ID=5]
0x104264310: i64 = TargetConstant<0> [ORD=77155] [ID=5]
0x10434da10: ch,glue = X86ISD::CALL 0x104279410, 0x104232910, 0x104085410, 0x10417a710, 0x104279410:1 [ORD=77157] [ID=30]
0x104232910: i64 = X86ISD::Wrapper 0x104085310 [ID=16]
You'll still want to leave in the disable in the compiler, otherwise LLVM will generate bad code. LLVM 3.3 does not fix this.
Yes, with this implementation no compiler changes are needed; it's just a 16-bit bitstype.
Ok, if someone wants to finish this, I'm away for the day.
Bump.
@StefanKarpinski do you just want to apply your patch?
I don't think just applying the patch works. There was a bunch of changes it needed to work.
It would be nice to have a nicer show() method for float16. Asking the question here in case this was done by design.
julia> float16(100.25)
Float16(0x5644)
Printing 16-bit floats correctly and minimally is quite non-trivial. Our 32-bit and 64-bit float printing are handled by the double-conversion library which does not support 16-bit floats. It might be possible to figure out a hack that approximates correct minimal Float16 printing using the printing routines for Float32, but it's not obvious how.
I wonder what is going on here:
julia> a = float16(rand(5,5))
5x5 Float16 Array:
0.445801 0.154785 0.431641 0.384521 0.188354
0.4646 0.281006 0.766602 0.563965 0.0402222
0.685059 0.92627 0.921875 0.933594 0.468994
0.841797 0.582031 0.0185242 0.481934 0.151367
0.348877 0.952637 0.672852 0.864746 0.166138
Float16
printing has several problems right now, e.g.
julia> print_shortest(STDOUT,NaN16)
NaN32
(plus NaN16
does not work properly)
I'm about to commit some fixes.
showcompact
has a fallback definition that is printing the Float16s in that array by converting them to Float64. The question is whether we should print the f0
suffix. For now I'll say that is specific to Float32, and leave it off.
This is a request for support for half-precision floating point numbers (Float16s).
(If there has been any discussion about adding support for these, which I would expect there was, I did not find it.)
Although the precision is low, Float16s are still useful when you have a very large quantity of floating point numbers (which is what we have) and want to reduce memory footprint, cache impact, or disk storage. (Currently, we manually convert our half precision floats with bit manipulations and reinterpretation, but the code would be cleaner if Julia supported them natively.)
Thanks.