issues
search
bigcode-project
/
Megatron-LM
Ongoing research training transformer models at scale
Other
376
stars
49
forks
source link
Preprocess hf
#10
Closed
RaymondLi0
closed
1 year ago
RaymondLi0
commented
2 years ago
Preprocess from HF dataset
Add HF tokenizer
increase maximum sequence-length to 8192
specify device_ids in barrier
Add scripts to convert MQA model to a HF custom model (will need to be reworked once
https://github.com/huggingface/transformers/pull/21253
is merged)