Write Simple Post about Non-Branching Max

dhermes commented 9 years ago

https://gist.github.com/dhermes/c79846c6074b938b2e10

Also

show assembly generated
discuss CUDA branching issues (BE BRIEF)
discuss what CUDA does to avoid branching for max/min

Maybe give a tiny refresher on IEEE:

import struct

def f(val):
    return ''.join([bin(ord(c))[2:].zfill(8) for c in struct.pack('>d', val)])

def g(bits):
    z = [bits[i:i + 8] for i in xrange(0, 64, 8)]
    z = [int(c, 2) for c in z]
    z = ''.join([chr(c) for c in z])
    return struct.unpack('>d', z)[0]

def h(s, e, m):
    return g(s + e + m)

h('0', '1' * 11, '1' + '0' * 51)

dhermes commented 9 years ago

objdump -d test

dhermes commented 9 years ago

https://gist.github.com/dhermes/f17fc85999f79ae2f304

dhermes commented 8 years ago

sign_bit_simple_mask.c
- https://godbolt.org/g/ZgYvMu
- https://godbolt.org/g/9PPTbC (optimized)
sign_bit_via_char_bytes.c
- https://godbolt.org/g/EAoktt
- https://godbolt.org/g/C8n7MZ (optimized)

https://github.com/mattgodbolt/gcc-explorer is awesome

dhermes commented 7 years ago

Also, a simple "optimized" fabs: https://godbolt.org/g/q2lryx

dhermes / bossylobster-blog

Write Simple Post about Non-Branching Max #56