I was kind of lost with the preprocess data tool. XD
Why wrap the qwen2tokenizer with MegatronTokenizer? why not directly use the huggingface tokenizer? And btw, I wasn't able to find a class named MegatronTokenizer in the megatron project. I doubt a code version mismatch, can someone explain this?
I was kind of lost with the preprocess data tool. XD Why wrap the qwen2tokenizer with MegatronTokenizer? why not directly use the huggingface tokenizer? And btw, I wasn't able to find a class named MegatronTokenizer in the megatron project. I doubt a code version mismatch, can someone explain this?