Open Wintoplay opened 1 year ago
Explore pre-training configurations or other literatures
Example: Optimizer, dataset sampling, and vocab size of tokenizer tokenizer type: SPM vs BPE clean data thresold learning rate (MUP)
Evaluation with real evaluation set
plan.md
Explore pre-training configurations or other literatures
Example: Optimizer, dataset sampling, and vocab size of tokenizer tokenizer type: SPM vs BPE clean data thresold learning rate (MUP)
Evaluation with real evaluation set