BurntSushi / rust-cbor

CBOR (binary JSON) for Rust with automatic type based decoding and encoding.
The Unlicense
128 stars 18 forks source link

f16 decoding incorrect. #9

Open Parakleta opened 8 years ago

Parakleta commented 8 years ago

I have a CBOR encoder/decoder for Lua which uses the canonicalization recommendations from the RFC, specifically that floating point values should be stored in the smallest representation that does not result in loss of information. This provides a decent saving on structures that have a lot of floating point values many of which are just small integers. Unfortunately this fails spectacularly for every value except 0.0 with the f16 decoding behaviour in rust-cbor.

An example correct decoding routine is available in Appendix D of the RFC. The C version focuses on breaking down the f16 and building a new f32 float with ldexp. The Python version uses bit manipulation to convert the f16 directly into an f32.

Parakleta commented 8 years ago

I'm not so good at writing Rust yet, but here's another option from Lua which doesn't use ldexp.

local function read_half_float(u16val)
    local neg = u16val & 0x8000
    local val = u16val & 0x7fff

    if val == 0 then return neg == 0 and 0.0 or -0.0 end -- Shortcut common case

    if val < 0x0400 then -- SubNormal
        return (neg == 0 and 0x1p-24 or -0x1p-24) * val
    end

    local single = (neg == 0 and 0x38000000 or 0xb8000000) + (val << 13)
    if val >= 0x7C00 then
        single += 0x38000000 -- Inf and NaN
    end
    return ("f"):unpack(("I4"):pack(single))
end

The unpack and pack step at the end is basically just a transmute. The 0x1p-24 I'm not sure how to translate into Rust, but is equivalent to 2.0^-24.

BurntSushi commented 8 years ago

Here's where the decoding happens: https://github.com/BurntSushi/rust-cbor/blob/master/src/decoder.rs#L151-L157

According to comments, my past self seems to have thought it was wrong too. :-)

Parakleta commented 8 years ago

I had a go at putting something together, I don't know enough about Rust design patterns to know if it's good or not, but here it is...

    let f: f32 = match (n&0x8000 != 0,((n&0x7fff) as u32) << 13) {
        (false,0) => 0.0,
        (true, 0) => -0.0,
        (false,x) if x < 0x00800000 => 7.27595761e-12f32 * (x as f32),
        (true, x) if x < 0x00800000 => -7.27595761e-12f32 * (x as f32),
        (false,x) if x < 0x0f800000 => unsafe { transmute (x+0x38000000u32) },
        (true, x) if x < 0x0f800000 => unsafe { transmute (x+0xb8000000u32) },
        (false,x) => unsafe { transmute (x+0x70000000u32) },
        (true, x) => unsafe { transmute (x+0xf0000000u32) },
    };
Parakleta commented 8 years ago

I wonder whether transmute is the right idea from a Rust safety point of view. The code I provided is fine so long as you are certain that the floating point is IEEE, but if it is not then probably better to use the example C code and swapping the ldexp(x,y) calls with (x as f32) * (2.0f32).powi(y) or similar so that everything is safe and stable. I wonder if the compiler/optimiser is smart enough to turn (x as f32) * (2.0f32).powi(y) into a bit-twiddle + transmute anyway.

ArtemGr commented 2 years ago

How about optional https://crates.io/crates/half ? Here hoping to see it in CBOR one day.