Open Zhang-kg opened 1 year ago
I also encountered this problem,how to get chinese vocab.json and merge.txt?
您好,您的邮件我已经收到!
Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.
您好,您的邮件我已经收到!
Marking as stale. No activity in 60 days.
您好,您的邮件我已经收到!
Marking as stale. No activity in 60 days.
I want to use the Megatron framework for Chinese NLP pre-training tasks. Currently, I have Chinese corpus resources and a vocab.txt file. However, for most frameworks, it seems that vocab.json and merge.txt are needed. Can I generate the above two files from Chinese corpus resources? If so, how can I generate them? Sorry, I haven't found a particularly suitable tutorial on Google.