Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.14k stars 99 forks source link

Initial effort to add chunkwise retention paradigm #3

Closed Aaryanverma closed 11 months ago

Jamie-Stirling commented 11 months ago

Thanks for this.

If possible, please could you modify your changes to be in line with the rest of the code?

The majority of the code of the other two paradigms is in retention.py, with retnet.py making reference to them. I would suggest moving the Chunk-wise in line with these two.

I'd also recommend adding a test to make sure they give the same output as the other two paradigms (see files prefixed with "test_").

If this is implemented I'll be able to merge.