Closed csjackson0 closed 1 year ago
nice work @csjackson0! @pascalnotin my latest modeling PR (less verbose) would likely give a merge conflict since the code is different (I imported GPT2Attention rather than code up an APTAttention which @csjackson0 has made changes in).
Perhaps we should merge @csjackson0's PR first and i'll revisit my modeling branch to acommodate his changes
In the mean time, i'll revisit my modeling branch and have the APTAttention And APTBlock included in my less verbose version?
How does that sound? @pascalnotin
Lgtm @csjackson0 - nice work! And sounds good regarding suggested plan @talkhanz! Merging this PR into main.
@othertea - fyi
This PR integrates rotary positional encodings into the APT model per issue #21
position_embedding = "rotary"
in the config.py file and ran the train.py script.I will keep the default positional_embeddings to "grouped_alibi"