embedding training config file

VisualJoyce / ChengyuBERT

[COLING 2020] BERT-based Models for Chengyu

MIT License

17 stars 3 forks source link

embedding training config file #21

Open starry-y opened 1 year ago

starry-y commented 1 year ago

Thanks for your work!

I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found _bert-wwm-extliterature file. Does the _bert-wwm-extliterature file replace the former file?

Thanks a lot!

Vimos commented 1 year ago

Hi, the basic difference of configurations are db paths. For embeddings, we use literature data rather than official data as the training data.

Yes, please use the ext_literature as the configuration file.

starry-y commented 1 year ago

Ok, thanks for your reply. I have replace the config file in the terminal.

And I have another question.

In the evaluation stage, what is pretrained/Chinese-word-vector/embeddings refering to ？

starry-y commented 1 year ago

And I could not find _chengyu_synonymdict in train_embedding.py ...

Sorry for bothering you, and waiting for your reply.

Vimos commented 1 year ago

Please refer to https://github.com/VisualJoyce/ChengyuBERT#learning-and-evaluating-chinese-idiom-embeddings

This is a different paper focusing on embedding learning and evaluation. The data has been shared online.