ZhangShiyue / QGforQA

MIT License
96 stars 21 forks source link

About Bert vocab #12

Open jemmryx opened 3 years ago

jemmryx commented 3 years ago

hello, i have some questions about the bert_qg vocab. you just given the preprocessed vocab file but the codes to get vocab and related embed are not in the preprocess.py. would you like to release this part of code?

ZhangShiyue commented 3 years ago

Hi, I used the vocab and pretrained embedding from BERT-base-uncased. The word_dictionary.json should be the same as bert_base_uncased_L-12_H-768_A-12/vocab.txt, except I converted it to a dictionary, e.g., "[CLS]":101. And the word_embed.pkl is dumped from BERT's pretrained embedding matrix. (Sorry I dumped this long ago and kept copying it to use. I lost the original code ) Other vocabs/embs (pos, ner, etc) are directly copied from ELMo-QG. You can check get_vocab() in https://github.com/ZhangShiyue/QGforQA/blob/master/QG/ELMo_QG/preprocess.py to see how to get these vocabs/embs.

jemmryx commented 3 years ago

Hi, I used the vocab and pretrained embedding from BERT-base-uncased. The word_dictionary.json should be the same as bert_base_uncased_L-12_H-768_A-12/vocab.txt, except I converted it to a dictionary, e.g., "[CLS]":101. And the word_embed.pkl is dumped from BERT's pretrained embedding matrix. (Sorry I dumped this long ago and kept copying it to use. I lost the original code ) Other vocabs/embs (pos, ner, etc) are directly copied from ELMo-QG. You can check get_vocab() in https://github.com/ZhangShiyue/QGforQA/blob/master/QG/ELMo_QG/preprocess.py to see how to get these vocabs/embs.

thanks, i will try this