bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
374 stars 49 forks source link

how to prepare the training data to train starcoder? #60

Open wwngh1233 opened 1 year ago

wwngh1233 commented 1 year ago

please tell me hot wo generate the following files:

WEIGHTS_TRAIN=/fsx/loubna/code/bigcode-data-mix/data/train_data_paths.txt.tmp

WEIGHTS_VALID=/fsx/loubna/code/bigcode-data-mix/data/valid_data_paths.txt.tmp

KOVVURISATYANARAYANAREDDY commented 1 year ago

Hello @wwngh1233 are you able to get these files? how to prepare these files?

L1aoXingyu commented 1 year ago

same questions