ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

add rough draft ROPE class #115

Open alibillalhammoud opened 6 months ago

gkielian commented 6 months ago

Hi Ali, I looked through the cordic algorithm a little more yesterday, seems very generalizable -- was wondering on the hardware side if you're planning on using a higher-precision fixed-point (or something along those line) for the "Theta" table and theta updates?

alibillalhammoud commented 6 months ago

Depends on the base we choose for the ROPE rotations. If we end up choosing 10000 as the base then I might consider something more than 16 bit fixed point. If we use 1000 as the base than 16 bit fixed point is probably enough.

But either way the accuracy of the theta updates will far exceed the accuracy of the rotator block

alibillalhammoud commented 5 months ago

@gkielian I implemented the different rotations in torch. I checked that I could call backwards on outputs from the rotation block. Can you try running the model with the "perfect" setting (line 211 of positional encoding variations)? That should work as normal ROPE.