The previous implementation did a naive test of every single bit. This resulted in a giant chain Ite expressions, which is probably hard on SMT solvers and at the very least does not seem very readable to me.
This reimplements the lifting of bsr, bsf, lzcnt, tzcnt, and popcnt instructions based on the branchless algorithms described in Hacker's Delight, chapters 5-3 and 5-4:
The previous implementation did a naive test of every single bit. This resulted in a giant chain
Ite
expressions, which is probably hard on SMT solvers and at the very least does not seem very readable to me.This reimplements the lifting of
bsr
,bsf
,lzcnt
,tzcnt
, andpopcnt
instructions based on the branchless algorithms described in Hacker's Delight, chapters 5-3 and 5-4:http://index-of.es/Security/Addison%20Wesley%20-%20Hackers%20Delight%202002.pdf
as well as the efficient implementation of Hamming weight:
https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation