Open jemmryx opened 3 years ago
Hi, I used the vocab and pretrained embedding from BERT-base-uncased. The word_dictionary.json should be the same as bert_base_uncased_L-12_H-768_A-12/vocab.txt, except I converted it to a dictionary, e.g., "[CLS]":101. And the word_embed.pkl is dumped from BERT's pretrained embedding matrix. (Sorry I dumped this long ago and kept copying it to use. I lost the original code ) Other vocabs/embs (pos, ner, etc) are directly copied from ELMo-QG. You can check get_vocab() in https://github.com/ZhangShiyue/QGforQA/blob/master/QG/ELMo_QG/preprocess.py to see how to get these vocabs/embs.
Hi, I used the vocab and pretrained embedding from BERT-base-uncased. The word_dictionary.json should be the same as bert_base_uncased_L-12_H-768_A-12/vocab.txt, except I converted it to a dictionary, e.g., "[CLS]":101. And the word_embed.pkl is dumped from BERT's pretrained embedding matrix. (Sorry I dumped this long ago and kept copying it to use. I lost the original code ) Other vocabs/embs (pos, ner, etc) are directly copied from ELMo-QG. You can check get_vocab() in https://github.com/ZhangShiyue/QGforQA/blob/master/QG/ELMo_QG/preprocess.py to see how to get these vocabs/embs.
thanks, i will try this
hello, i have some questions about the bert_qg vocab. you just given the preprocessed vocab file but the codes to get vocab and related embed are not in the preprocess.py. would you like to release this part of code?