BlackSamorez / tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference
MIT License
629 stars 39 forks source link

Gpt2 fix #103

Closed BlackSamorez closed 1 year ago

BlackSamorez commented 1 year ago

GPT-2 combines attention weights as QQQQQKKKKKVVVVV. The function to properly split those weights with respect to individual attention heads was broken. This PR fixes it.

BlackSamorez commented 1 year ago

It also resolves #99