Closed agenius5 closed 3 years ago
Sorry for the confusion.
The hyperparameter table in the paper was initially reported for the models C4 and HugeNews. However, we updated the model and released the ckpt for (mixed, stochastic) eventually since it has best accuracies, so the hyperparameters can be slightly different from the Appendix C in the paper.
We used the same ROUGE-L metrics as previous works for a fair comparison. Except CNN/DM (which uses RougeLsum-F), all other datasets used RougeL-F and as far as I remember, they should be close.
Okay, I get it.
Also --param_override doesn't seem to working at all, any ideas why?
I used manually put the beam size and beam alpha in in public_params.py
but today I realized the vocab
parameter is also not getting passed. Unknowingly, I have using using sp_test.model till now while it should be c4.unigram dictionary. So, what's the difference between two and does it affect training and model performance?
Please make sure the flags are spelled correctly.
The sp_test.model
is built given sp_test.vocab
for unit-test purpose.
Umm, okay. So we need to pass .model
file for vocab, right? as it's been given in training and evaluation scripts?
And I wanted to know what's the difference between this sp_test dictionary and c4_unigram_96k dictionary. I have reading the section of paper which talks about the effect of dictionary, that's why I wanted to know what's this sp_test dictionary.
sp_test is for unit-test. It is much smaller than 96k.
Please download the vocab model and file from gcloud and use them (as explained in README).
Okay, Got it.
Thanks.
Hi, @JingqingZ
I had one thing to ask. I'll ask it right here instead of creating a new issue.
So, I initialized the pegasus with fine-tuned checkpoints available on gcloud for AESLC dataset. I started further fine-tuning with a increased dropout of 0.3 and it improved the model's ROUGE scores significantly on test set.
Earlier you said you didn't experimented with higher dropout values. I just wanna know if it's okay to train the model this way i.e. stop the training and resume with changed hyperparameter, in my case it's the dropout value I'll be changing.
Technically, yes, you can.
Hi, @JingqingZ
According to the hyperparamter table reported in paper, max input/tokens kept for xsum and reddit dataset were 512 whereas the in the dataset registry they are defined as 1024. Why so?
Moreover, the checkpoints uploaded on gcloud are obtained by using hyperparameter (including the max token length) defined in dataset registry, right? Though for some dataset, train steps in dataset registry are different than the checkpoint available on gcloud but I guess it's the one with best validation result.
Give me some insights on max input length thing. I am quite confused about it.
Edit: One more thing, why did you report RougeL-F for pubmed dataset and not the RougeLsum-F?