Open gojkoc54 opened 1 year ago
Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.
Regarding FIM percentage, we used 50%.
Hello, we didn't perform the ablation for StarCoder given the amount of compute it requires for training, but you can check the CodeLLama paper where the authors observed similar behavior at different scales.
Regarding FIM percentage, we used 50%.
i have a question, as the known ratio, many eval ratios drop because of fim under pretrain stage, why you still use fim with 50% percentage?
Hi!
Curious to know some more details about FIM and its effect on the pre-trained model. Here's a paragraph from the SantaCoder paper: