NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.32k stars 2.1k forks source link

[Question] How to generate a merge file and a vocab file #312

Open Zhang-kg opened 1 year ago

Zhang-kg commented 1 year ago

I want to use the Megatron framework for Chinese NLP pre-training tasks. Currently, I have Chinese corpus resources and a vocab.txt file. However, for most frameworks, it seems that vocab.json and merge.txt are needed. Can I generate the above two files from Chinese corpus resources? If so, how can I generate them? Sorry, I haven't found a particularly suitable tutorial on Google.

MrInouye commented 1 year ago

I also encountered this problem,how to get chinese vocab.json and merge.txt?

Zhang-kg commented 1 year ago

您好,您的邮件我已经收到!

github-actions[bot] commented 12 months ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

Zhang-kg commented 12 months ago

您好,您的邮件我已经收到!

github-actions[bot] commented 10 months ago

Marking as stale. No activity in 60 days.

Zhang-kg commented 10 months ago

您好,您的邮件我已经收到!

github-actions[bot] commented 8 months ago

Marking as stale. No activity in 60 days.