Open iislucas opened 2 months ago
Context:
A unit test for transformer setup and training: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts
Transformer implementation: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/transformer/transformer_gtensor.ts
GTensor is a class that encapsulates named tensors. See these unit test to get a sense of it: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/gtensor/gtensor.spec.ts
The current loss function: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/transformer/transformer_gtensor.ts#L372
Goal: Implement the standard decoder transformer loss function of providing gradients from every token simultaneously. (e.g. gpt2-style)
Context:
A unit test for transformer setup and training: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/trainer/basic_transformer_trainer.spec.ts
Transformer implementation: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/transformer/transformer_gtensor.ts
GTensor is a class that encapsulates named tensors. See these unit test to get a sense of it: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/gtensor/gtensor.spec.ts
The current loss function: https://github.com/PAIR-code/tiny-transformers/blob/main/animated-transformer/src/lib/transformer/transformer_gtensor.ts#L372
Goal: Implement the standard decoder transformer loss function of providing gradients from every token simultaneously. (e.g. gpt2-style)