Open shatealaboxiaowang opened 5 months ago
You can see "https://github.com/EleutherAI/gpt-neox/blob/FIM-clean/megatron/data/gpt2_dataset.py#L339". We use the character-level disruption.
You can see "https://github.com/EleutherAI/gpt-neox/blob/FIM-clean/megatron/data/gpt2_dataset.py#L339". We use the character-level disruption.
Thx,i will look at it.
@shatealaboxiaowang were you able to construct a proper FIM dataset?
You can see "https://github.com/EleutherAI/gpt-neox/blob/FIM-clean/megatron/data/gpt2_dataset.py#L339". We use the character-level disruption.
Thx,i will look at it.
Do you fine tune the model for fim successfuly? what does the FIM dataset look like,can you share your solutions? thanks very much.
Hi, dear:
Thank you very much for your open source. Will the code of FIM dataset construction and training be made public? such as the number of lines or length of the code for Prefix, suffix, and middle. We would like to build on your model and fine-tune it on our own code data warehouse, especially to improve the FIM performance of our internal code.
Thx.