THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
https://THUDM.github.io/SwissArmyTransformer
Apache License 2.0
951 stars 90 forks source link

Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture? #146

Closed tiendung closed 9 months ago

tiendung commented 9 months ago

From the source code sat/model/official/chatglm3_model.py I cannot find 2D positional encoding.

1049451037 commented 9 months ago

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

tiendung commented 9 months ago

Yes, chatglm3 uses multiplicative 1d rotary position. But it is not same as GPT, because GPT uses additive absolute position embedding.

So chatglm3 was trained to predict next token only (without filling blanks ...)?

1049451037 commented 9 months ago

I'm not sure. I didn't work on it and I just transformed the model weight into SAT.