bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

Train Python model with FIM #8

Closed harm-devries closed 1 year ago

harm-devries commented 1 year ago

As discussed today, let's train a 350M model with the following hyper parameters:

Let's see how it compares against previously trained models.

lvwerra commented 1 year ago

@RaymondLi0 was this the run with the results you showed last week? Would you mind adding them here?

RaymondLi0 commented 1 year ago

A 350M model was trained on https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate with:

There is a slight degradation at 300k steps (around 2% pass-at-10 and pass-at-100). At 150k steps, which matches the amount of training tokens used in the FIM paper, there is no degradation.

Figure 12 (2) Figure 11 (2) Figure 6 (1)

sepilqi commented 1 year ago

@RaymondLi0 this link is not available https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate

lvwerra commented 1 year ago

This is a private dev dataset. You can use the Python subset of https://huggingface.co/datasets/bigcode/the-stack-dedup instead.