Closed harm-devries closed 2 years ago
@RaymondLi0 was this the run with the results you showed last week? Would you mind adding them here?
A 350M model was trained on https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate with:
There is a slight degradation at 300k steps (around 2% pass-at-10 and pass-at-100). At 150k steps, which matches the amount of training tokens used in the FIM paper, there is no degradation.
@RaymondLi0 this link is not available https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate
This is a private dev dataset. You can use the Python subset of https://huggingface.co/datasets/bigcode/the-stack-dedup instead.
As discussed today, let's train a 350M model with the following hyper parameters:
Let's see how it compares against previously trained models.