Open cmosguy opened 8 months ago
Hi, the 15B model was trained on 600+ programming languages including TCL, here's the full list of languages: https://huggingface.co/datasets/bigcode/the-stack-v2/blob/main/language_stats.csv
The 7B and 3B though were only trained on 17 languages available in the paper
For FIM it's similar to StarCoder, you can use this code with the right tokens (they're different from SantaCoder, we use underscores instead of dashes)
Hello @loubnabnl is it possible to get starcoder2 to learn TCL?
It was not part of the 30 languages so was curious if it's worth pursuing with SFT?
Also, is there FIM script you used for this version of starcoder2?