galatolofederico / vanilla-llama

Plain pytorch implementation of LLaMA
GNU General Public License v3.0
189 stars 31 forks source link

Perfect replica #7

Closed gpucce closed 1 year ago

gpucce commented 1 year ago

@galatolofederico thanks for making this. Are you able to replicate the original model output exactly with this approach? maybe checking on one of the smaller models?

galatolofederico commented 1 year ago

Hi, this implementation should behave exactly like the original model. The only differences are related to the numerical instabilities of Linear with respect to RowParallelLinear and ColumnParallelLinear. Mathematically the outputs should be equal but there are some slight differences that can build up to result in a non-negligible output difference.

gpucce commented 1 year ago

@galatolofederico thank you very much for the reply! I noticed this behaviour and also thought this could be the reason. I was curious if you had seen the same.