Train Python model with FIM

bigcode-project / Megatron-LM

Ongoing research training transformer models at scale

Other

376 stars 49 forks source link

Train Python model with FIM #8

Closed harm-devries closed 2 years ago

harm-devries commented 2 years ago

As discussed today, let's train a 350M model with the following hyper parameters:

FIM rate 0.5
SPM vs PSM: 0.5
ALiBi encodings
Multi-Head Attention

Let's see how it compares against previously trained models.

lvwerra commented 2 years ago

@RaymondLi0 was this the run with the results you showed last week? Would you mind adding them here?

RaymondLi0 commented 2 years ago

A 350M model was trained on https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate with:

fim_rate=0.5
fim_spm_rate=0.5
absolute positional encodings
multi-head attention

There is a slight degradation at 300k steps (around 2% pass-at-10 and pass-at-100). At 150k steps, which matches the amount of training tokens used in the FIM paper, there is no degradation.

Figure 12 (2) Figure 11 (2) Figure 6 (1)

sepilqi commented 1 year ago

@RaymondLi0 this link is not available https://huggingface.co/datasets/bigcode/permissive-python-dedup-decontaminate

lvwerra commented 1 year ago

This is a private dev dataset. You can use the Python subset of https://huggingface.co/datasets/bigcode/the-stack-dedup instead.