Fix the attention code to allow GPT-2 weight loading

LxMLS / lxmls-toolkit

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School

Other

222 stars 216 forks source link

Fix the attention code to allow GPT-2 weight loading #198

Open ramon-astudillo opened 1 year ago

ramon-astudillo commented 1 year ago

Upgrade the easier to understand GPT-2 attention code to allow loading GPT-2 weights.

i.e. avoid separate loaders/code for pre-trained and non pre-trained model weights https://github.com/LxMLS/lxmls-toolkit/blob/master/lxmls/transformers/model.py#L123

israfelsr commented 1 year ago

I'm workin in the branch unified-attention. I managed to split the weights of the QKV projections but there is still something missing. When prompting the model doesn't work correctly.

israfelsr commented 1 year ago

Done, we can close this one after merging unified-attention to master.