Open ramon-astudillo opened 1 year ago
I'm workin in the branch unified-attention. I managed to split the weights of the QKV projections but there is still something missing. When prompting the model doesn't work correctly.
Done, we can close this one after merging unified-attention to master.
Upgrade the easier to understand GPT-2 attention code to allow loading GPT-2 weights.
i.e. avoid separate loaders/code for pre-trained and non pre-trained model weights https://github.com/LxMLS/lxmls-toolkit/blob/master/lxmls/transformers/model.py#L123