About Two-stage - Githubissues

VisualJoyce / ChengyuBERT

[COLING 2020] BERT-based Models for Chengyu

MIT License

17 stars 3 forks source link

About Two-stage #11

Closed WinniyGD closed 3 years ago

WinniyGD commented 3 years ago

I'm sorry to bother you again.

I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '? What's the difference between the stage-1-pretain and using 'train_pretrain. py'?

What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)

Could you describe more details? Thanks very much.

VisualJoyce commented 3 years ago

Generaly speaking, stage one is a Chengyu-oritented pretraining using a large corpus. This process is done through train_pretrain.py. But this stage may cost huge computation power, so we released the https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext model over huggingface.

We have details about models for each abalation study in Section 4.2.

w/o Pre-Training is running stage 2 directly from hfl/chinese-bert-wwm-ext
w/o Fine-Tuning is zero-shot evaluation using model of stage 1
w/o 𝐿V and w/o 𝐿A are whether we use original, enlarged or combined loss function in finetuning, which can be found in the code.

WinniyGD commented 3 years ago

Wow , OK. I wanna know the small candidate set is 3848 or 7 (one question candidate) which one ?.

and .... the two vocabulary is the same ?

微信图片_20210404175121

VisualJoyce commented 3 years ago

Vocabulary is the same 32k idioms, although during finetuning the model will only change the first 3848 entries of the vocabulary.

Options is used in Stage 2, which is the candidate sets of seven options.

WinniyGD commented 3 years ago

Oh, Thx. Most problems have been solved for the time being. Thank you very much for your kind answer.