Closed m-carrasco closed 2 years ago
Hi @m-carrasco, in order to leave out the division operation that function would have to manually compose an IEEE 754 number using bit twiddling. And that's not portable since there's no requirement any C++ compiler has to implement floats using IEEE 754.
It seems possible to use ldexp
, but that uses an int-to-double cast, and ldexp
itself seems to be implemented in GCC using fild
and fscale
(on Intel), which are FP instructions.
What exactly are your circumstances where you have to work with doubles, but not emit FP instructions? And by FP instructions I assume you division, etc? Because a simple int-to-double cast already uses instructions that might be classified as FP instructions on e.g. x86-64 and arm.
Hi @MikeLankamp,
Thanks for your answer.
in order to leave out the division operation that function would have to manually compose an IEEE 754 number using bit twiddling. And that's not portable since there's no requirement any C++ compiler has to implement floats using IEEE 754.
Thanks, now I understand why the library works in this way. It makes sense. Regarding the bit twiddling, I think I found a way to do it. Could a to_IEEE_754_double()
function be accepted in fpm? If so, please let me know and I'll do my best to do it appropriately.
It seems possible to use ldexp, but that uses an int-to-double cast, and ldexp itself seems to be implemented in GCC using fild and fscale (on Intel), which are FP instructions.
I didn't know about ldexp. I think that what I've implemented is equivalent to frexp. From what I understand, none of them is currently implemented in fpm, right? frexp returns the basic components of a double in IEEE 754.
What exactly are your circumstances where you have to work with doubles, but not emit FP instructions? And by FP instructions I assume you division, etc? Because a simple int-to-double cast already uses instructions that might be classified as FP instructions on e.g. x86-64 and arm.
I'm using KLEE. In short, it is a (symbolic) LLVM-IR interpreter. However, it cannot process floating-point computations even if the LLVM module's target triplet does. In my use case, I'm linking fpm in a library and replacing some floating-point computations where needed. When I refer to FP instructions, I assume any LLVM-IR arithmetic instruction that involves floating-point values (including division).
Here is my non-portable version (only for IEEE 754 - in case endianness is relevant, I assume little endian) of static_cast<double>
for fixed_16_16
. I have just finished writing it, so I must be missing some edge cases. Also, I didn't check the LLVM bitcode to check if there is any fp instruction. Do you think it makes sense? I can write up an explanation of how it works if needed (let me know), but now it is just too late.
#include <fpm/fixed.hpp> // For fpm::fixed_16_16
#include <fpm/math.hpp> // For fpm::cos
#include <fpm/ios.hpp> // For fpm::operator<<
#include <iostream> // For std::cin, std::cout
#include <bitset>
#include <cstring>
#include <assert.h>
static std::bitset<11> get_exponent_bitset(const fpm::fixed_16_16& x) {
// Is using abs for negative values correct?
auto p = fpm::log2(fpm::abs(x));
auto p_floor = fpm::floor(p);
auto bias = fpm::fixed_16_16{1023};
auto exp = static_cast<int32_t>(p_floor + bias);
std::bitset<32> exp_bits{exp};
std::bitset<11> double_exp_bits;
for (int32_t i =0; i < 11; i++){
double_exp_bits[i] = exp_bits[i];
}
return double_exp_bits;
}
static std::bitset<52> get_fraction_bitset(const fpm::fixed_16_16& x) {
auto p = fpm::log2(fpm::abs(x));
auto p_floor = fpm::floor(p);
auto fraction_exp = p - p_floor;
auto fraction = fpm::pow(fpm::fixed_16_16{2}, fraction_exp);
std::bitset<32> raw_value_bits {fraction.raw_value()};
std::bitset<52> double_fraction{0};
for (int i =0; i < 16; i++){
double_fraction[36 + i] = raw_value_bits[i];
}
return double_fraction;
}
static double to_double(fpm::fixed_16_16 x){
std::bitset<1> sign{x < fpm::fixed_16_16{0}};
std::bitset<11> exponent{get_exponent_bitset(x)};
std::bitset<52> fraction{get_fraction_bitset(x)};
std::bitset<64> double_raw_bits;
double_raw_bits[63] = sign[0];
for (int i=0; i< 11; i++){
double_raw_bits[52+i] = exponent[i];
}
for (int i=0; i<52; i++){
double_raw_bits[i] = fraction[i];
}
double d;
unsigned long l = double_raw_bits.to_ulong();
assert(sizeof(double) == sizeof(unsigned long));
memcpy(&d, &l, sizeof(d));
return d;
}
int main() {
std::cout << fpm::fixed_16_16::pi() << std::endl;
std::cout << to_double(fpm::fixed_16_16::pi()) << std::endl;
return 0;
}
// Output:
// 3.14159
// 3.14166
Relevant sources: http://mathcenter.oxford.emory.edu/site/cs170/ieee754/ and https://www.kmjn.org/notes/converting_to_scientific_notation.html
I'm using KLEE. In short, it is a (symbolic) LLVM-IR interpreter. However, it cannot process floating-point computations even if the LLVM module's target triplet does. In my use case, I'm linking fpm in a library and replacing some floating-point computations where needed.
Sure, but then: what is the point of converting to double if you can't do anything with that double afterwards? i.e. what does the code using fpm look like that you want run through KLEE?
Hi @MikeLankamp
Sorry for my delay, and thanks for your answer.
what is the point of converting to double if you can't do anything with that double afterwards?
I see your point. Indeed it is strange and possibly not the best. In my use case, I am analyzing (calling) only a few methods in a library. So, I just wanted to replace double computations in those methods. Then, I stored the final results as doubles. Ideally, this was enough for me.
Having said this, my attempt of not replacing all double uses with fpm did not work. KLEE (perhaps the SMT solvers) are not able to cope with my static double to_double(fpm::fixed_16_16 x)
implementation. Rather than simplifying it, I found it faster to replace all doubles with your library. This worked as expected.
I found it faster to replace all doubles with your library. This worked as expected.
I see, great that it worked! Does that mean we can close this issue?
Yes, absolutely. Thanks a lot.
Hi,
Before all, thanks for sharing this amazing project. I just wanted to ask for some help regarding this very particular use case I have.
The static cast above is triggering this function in fpm:
operator T()
computes the final double using double division. Would it be possible to construct thedouble
without relying on thedouble
data type operations (in this case division)? I was wondering if this function could be re-written to compute the same result usingn.raw_value()
somehow (and not using any double operation at all).To contextualise, my requirement is not to emit any floating point instruction at all but still be able to cast to double. Even a hacky workaround for this is useful.
Best regards, Manuel.