Open alibillalhammoud opened 6 months ago
Depends on the base we choose for the ROPE rotations. If we end up choosing 10000 as the base then I might consider something more than 16 bit fixed point. If we use 1000 as the base than 16 bit fixed point is probably enough.
But either way the accuracy of the theta updates will far exceed the accuracy of the rotator block
@gkielian I implemented the different rotations in torch. I checked that I could call backwards on outputs from the rotation block. Can you try running the model with the "perfect" setting (line 211 of positional encoding variations)? That should work as normal ROPE.
Hi Ali, I looked through the cordic algorithm a little more yesterday, seems very generalizable -- was wondering on the hardware side if you're planning on using a higher-precision fixed-point (or something along those line) for the "Theta" table and theta updates?