Closed cgyurgyik closed 3 years ago
Wow; awesome work digging into the right way to do this!!! Crazy that the fractional part needs a polynomial approximation.
One other random idea occurs to me: depending on the width, it could be efficient approximate the fractional part with a lookup table. Basically, the unit could hard-code a mapping from f
to e^f
, quantized to ranges of values for f
. For example, if there are only 4 fractional bits, then this lookup table would only need to have 2^4=16 entries, right?
Anyway, not sure if that's a good idea—just a thought.
Wow; awesome work digging into the right way to do this!!! Crazy that the fractional part needs a polynomial approximation.
One other random idea occurs to me: depending on the width, it could be efficient approximate the fractional part with a lookup table. Basically, the unit could hard-code a mapping from
f
toe^f
, quantized to ranges of values forf
. For example, if there are only 4 fractional bits, then this lookup table would only need to have 2^4=16 entries, right?Anyway, not sure if that's a good idea—just a thought.
I like this idea because it means I can do most of the calculations in regular floating point, and then only need to convert to fixed point with the final compute. I'll try this, and see if I run into any potential issues. An obvious constraint I foresee is table size, since a 2^16 table is less than ideal, but I imagine we can get away with fewer entries somehow if this is necessary.
Awesome! Yeah, after a certain point, a lookup table becomes unwieldy. But perhaps just using a lower precision (i.e., dropping the fractional bits beyond a given fixed maximum) wouldn't be too bad! 😇
Ok I've written up a component for std_fp_exp
, which uses Q28.4
.
The e_table
to evaluate e^x within [0, 1] has to round each entry to the nearest fixed point format.
Same goes for e
itself. Because of this, there will be a decent amount of error involved.
Results:
Using a calculator: e^3.5 = 33.115
Using std_exp
with Q28.4
: 502 = 31.375
Using larger exponents leads to larger errors, as expected.
Incredibly cool. That seems like a not-bad approximation, all things considered, and it's awesome that this is a reasonably low-complexity way to get the result in hardware.
Of course, aside from this specific advancement, I can't help but think about other implications:
@cgyurgyik are you waiting on a PR/review for this?
@cgyurgyik are you waiting on a PR/review for this?
I wasn't. I got sidetracked with looking at fixed point libraries. Perhaps we could continue with this until we find something better? I imagine the one unnatural dependency here is the requirement that we use Q28.4
, which was just an arbitrary size that I picked. If we use another drop-in implementation, then this may change.
Yeah, that sounds good to me! In an idealized future, we'd want to generate tables for arbitrary precisions, but I think sticking to just this one for now is a good step. 👍
In that case, merge it when ready! (Unless you're waiting on #292)
In that case, merge it when ready! (Unless you're waiting on #292)
Sounds good. I'm currently messing around to see how easily we can actually drop this into another calyx component.
Currently just getting zeroes back, presumably because its using the integer-based div_pipe
s. Hopefully Dahlia 356 will fix this.
Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?
Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?
Not blocked. Going to close with the #463.
Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?
I'm closing this for now. I'll come back to it at a later point, but I think finding or implementing a real SV exp
or softmax
is a better route, if possible. Also, need to get a working fp_{div, mult, mod}_pipe
as well.
From initial research, it seems that a common way to implement
pow
approximations is to first implemente^x
, and use its properties to implement otherpow
s.I am breaking the exponent
x
into its integer value and its fractional value,For the integer value, we can simply multiply
e
n times, where n is the integer value. For the fractional value, I am using Chebyshev polynomials to approximatee^x
bounded by [0, 1].Currently, I'm just converting the decimal value of the number to floating point. Instead, all the math should also be done in fixed point. This might get tricky, given things such as overflow.
Addresses #299.