Open starry-y opened 1 year ago
Hi, the basic difference of configurations are db paths. For embeddings, we use literature data rather than official data as the training data.
Yes, please use the ext_literature
as the configuration file.
Ok, thanks for your reply. I have replace the config file in the terminal.
And I have another question.
In the evaluation stage, what is pretrained/Chinese-word-vector/embeddings refering to ?
And I could not find _chengyu_synonymdict in train_embedding.py ...
Sorry for bothering you, and waiting for your reply.
Please refer to https://github.com/VisualJoyce/ChengyuBERT#learning-and-evaluating-chinese-idiom-embeddings
This is a different paper focusing on embedding learning and evaluation. The data has been shared online.
Thanks for your work!
I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found _bert-wwm-extliterature file. Does the _bert-wwm-extliterature file replace the former file?
Thanks a lot!