calyxir / calyx

Intermediate Language (IL) for Hardware Accelerator Generators
https://calyxir.org
MIT License
453 stars 45 forks source link

Fixed point `std_exp` #404

Closed cgyurgyik closed 3 years ago

cgyurgyik commented 3 years ago

From initial research, it seems that a common way to implement pow approximations is to first implement e^x, and use its properties to implement other pows.

I am breaking the exponent x into its integer value and its fractional value,

e^x = e^i * e^f

For the integer value, we can simply multiply e n times, where n is the integer value. For the fractional value, I am using Chebyshev polynomials to approximate e^x bounded by [0, 1].

Currently, I'm just converting the decimal value of the number to floating point. Instead, all the math should also be done in fixed point. This might get tricky, given things such as overflow.

Addresses #299.

sampsyo commented 3 years ago

Wow; awesome work digging into the right way to do this!!! Crazy that the fractional part needs a polynomial approximation.

One other random idea occurs to me: depending on the width, it could be efficient approximate the fractional part with a lookup table. Basically, the unit could hard-code a mapping from f to e^f, quantized to ranges of values for f. For example, if there are only 4 fractional bits, then this lookup table would only need to have 2^4=16 entries, right?

Anyway, not sure if that's a good idea—just a thought.

cgyurgyik commented 3 years ago

Wow; awesome work digging into the right way to do this!!! Crazy that the fractional part needs a polynomial approximation.

One other random idea occurs to me: depending on the width, it could be efficient approximate the fractional part with a lookup table. Basically, the unit could hard-code a mapping from f to e^f, quantized to ranges of values for f. For example, if there are only 4 fractional bits, then this lookup table would only need to have 2^4=16 entries, right?

Anyway, not sure if that's a good idea—just a thought.

I like this idea because it means I can do most of the calculations in regular floating point, and then only need to convert to fixed point with the final compute. I'll try this, and see if I run into any potential issues. An obvious constraint I foresee is table size, since a 2^16 table is less than ideal, but I imagine we can get away with fewer entries somehow if this is necessary.

sampsyo commented 3 years ago

Awesome! Yeah, after a certain point, a lookup table becomes unwieldy. But perhaps just using a lower precision (i.e., dropping the fractional bits beyond a given fixed maximum) wouldn't be too bad! 😇

cgyurgyik commented 3 years ago

Ok I've written up a component for std_fp_exp, which uses Q28.4.

The e_table to evaluate e^x within [0, 1] has to round each entry to the nearest fixed point format. Same goes for e itself. Because of this, there will be a decent amount of error involved.

Results: Using a calculator: e^3.5 = 33.115 Using std_exp with Q28.4: 502 = 31.375

Using larger exponents leads to larger errors, as expected.

sampsyo commented 3 years ago

Incredibly cool. That seems like a not-bad approximation, all things considered, and it's awesome that this is a reasonably low-complexity way to get the result in hardware.

Of course, aside from this specific advancement, I can't help but think about other implications:

rachitnigam commented 3 years ago

@cgyurgyik are you waiting on a PR/review for this?

cgyurgyik commented 3 years ago

@cgyurgyik are you waiting on a PR/review for this?

I wasn't. I got sidetracked with looking at fixed point libraries. Perhaps we could continue with this until we find something better? I imagine the one unnatural dependency here is the requirement that we use Q28.4, which was just an arbitrary size that I picked. If we use another drop-in implementation, then this may change.

sampsyo commented 3 years ago

Yeah, that sounds good to me! In an idealized future, we'd want to generate tables for arbitrary precisions, but I think sticking to just this one for now is a good step. 👍

rachitnigam commented 3 years ago

In that case, merge it when ready! (Unless you're waiting on #292)

cgyurgyik commented 3 years ago

In that case, merge it when ready! (Unless you're waiting on #292)

Sounds good. I'm currently messing around to see how easily we can actually drop this into another calyx component.

cgyurgyik commented 3 years ago

Currently just getting zeroes back, presumably because its using the integer-based div_pipes. Hopefully Dahlia 356 will fix this.

rachitnigam commented 3 years ago

Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?

cgyurgyik commented 3 years ago

Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?

Not blocked. Going to close with the #463.

Remind me—is this blocked on the new data format stuff? If so, can you mark it with the "blocked" label?

I'm closing this for now. I'll come back to it at a later point, but I think finding or implementing a real SV exp or softmax is a better route, if possible. Also, need to get a working fp_{div, mult, mod}_pipe as well.