Preprocess hf - Githubissues

bigcode-project / Megatron-LM

Ongoing research training transformer models at scale

Other

376 stars 49 forks source link

Closed RaymondLi0 closed 1 year ago

RaymondLi0 commented 2 years ago

Preprocess from HF dataset
Add HF tokenizer
increase maximum sequence-length to 8192
specify device_ids in barrier
Add scripts to convert MQA model to a HF custom model (will need to be reworked once https://github.com/huggingface/transformers/pull/21253 is merged)