Closed WinniyGD closed 3 years ago
Generaly speaking, stage one is a Chengyu-oritented pretraining using a large corpus. This process is done through train_pretrain.py
. But this stage may cost huge computation power, so we released the https://huggingface.co/visualjoyce/chengyubert_2stage_stage1_wwm_ext model over huggingface.
We have details about models for each abalation study in Section 4.2.
w/o Pre-Training
is running stage 2 directly from hfl/chinese-bert-wwm-ext
w/o Fine-Tuning
is zero-shot evaluation using model of stage 1w/o 𝐿V
and w/o 𝐿A
are whether we use original
, enlarged
or combined
loss function in finetuning, which can be found in the code.Wow , OK. I wanna know the small candidate set is 3848 or 7 (one question candidate) which one ?.
and .... the two vocabulary is the same ?
Vocabulary is the same 32k idioms, although during finetuning the model will only change the first 3848 entries of the vocabulary.
Options is used in Stage 2, which is the candidate sets of seven options.
Oh, Thx. Most problems have been solved for the time being. Thank you very much for your kind answer.
I'm sorry to bother you again.
I wanna know whether the codes of paper ( ' A BERT-based two-stage model for Chinese Chengyu recommendation ' about two-stage) are only using ' train_pretrain.py ' and ' train_official.py '? What's the difference between the stage-1-pretain and using 'train_pretrain. py'?
What's more, What's the difference among w/o Pre-Training 、w/o Fine-Tuning 、 w/o 𝐿V and w/o 𝐿A. (I don't quite understand what you're showing in your paper.)
Could you describe more details? Thanks very much.