VisualJoyce / ChengyuBERT

[COLING 2020] BERT-based Models for Chengyu
MIT License
17 stars 3 forks source link

About modeling. #13

Closed WinniyGD closed 3 years ago

WinniyGD commented 3 years ago

Sry, I have one more question about the codes.

What's the difference or purpose among the following model classes:

@register_model('chengyubert-2stage-stage2-mask') @register_model('chengyubert-2stage-stage2-cls') @register_model('chengyubert-2stage-stage2-window') @register_model('chengyubert-2stage-stage2-mask-window')

VisualJoyce commented 3 years ago

These were for ablation studies. But I find some of them are not significant enough.

WinniyGD commented 3 years ago

OK, could you share the 33237 vocabulary and it's dataset ?

VisualJoyce commented 3 years ago

You can download the idioms_pretrain.json.

WinniyGD commented 3 years ago

Thx ! It was really helpful. 👍

WinniyGD commented 3 years ago

Sry, I wanna know the relation between embedding index and idioms id. image image It means the 0-3847(index) embedding is ChID idioms embedding ? The file 'idioms_pretrain.json' does not show the idiom id. How to get the one-to-one correspondence?

VisualJoyce commented 3 years ago

It's constructed via:

https://github.com/VisualJoyce/ChengyuBERT/blob/2d19da89949c6cc9165e18d5a33ac29dbceae921/chengyubert/data/__init__.py#L115-L143

WinniyGD commented 3 years ago

Oh!Amazing!Thank for your patience!