Max input length for Reddit_tifu and xsum

google-research / pegasus

Apache License 2.0

1.61k stars 316 forks source link

Max input length for Reddit_tifu and xsum #159

Closed agenius5 closed 3 years ago

agenius5 commented 3 years ago

Hi, @JingqingZ

According to the hyperparamter table reported in paper, max input/tokens kept for xsum and reddit dataset were 512 whereas the in the dataset registry they are defined as 1024. Why so?

Moreover, the checkpoints uploaded on gcloud are obtained by using hyperparameter (including the max token length) defined in dataset registry, right? Though for some dataset, train steps in dataset registry are different than the checkpoint available on gcloud but I guess it's the one with best validation result.

Give me some insights on max input length thing. I am quite confused about it.

Edit: One more thing, why did you report RougeL-F for pubmed dataset and not the RougeLsum-F?

JingqingZ commented 3 years ago

Sorry for the confusion.

The hyperparameter table in the paper was initially reported for the models C4 and HugeNews. However, we updated the model and released the ckpt for (mixed, stochastic) eventually since it has best accuracies, so the hyperparameters can be slightly different from the Appendix C in the paper.

We used the same ROUGE-L metrics as previous works for a fair comparison. Except CNN/DM (which uses RougeLsum-F), all other datasets used RougeL-F and as far as I remember, they should be close.

agenius5 commented 3 years ago

Okay, I get it.

Also --param_override doesn't seem to working at all, any ideas why?

I used manually put the beam size and beam alpha in in public_params.py but today I realized the vocab parameter is also not getting passed. Unknowingly, I have using using sp_test.model till now while it should be c4.unigram dictionary. So, what's the difference between two and does it affect training and model performance?

JingqingZ commented 3 years ago

Please make sure the flags are spelled correctly.

The sp_test.model is built given sp_test.vocab for unit-test purpose.

agenius5 commented 3 years ago

Umm, okay. So we need to pass .model file for vocab, right? as it's been given in training and evaluation scripts?

And I wanted to know what's the difference between this sp_test dictionary and c4_unigram_96k dictionary. I have reading the section of paper which talks about the effect of dictionary, that's why I wanted to know what's this sp_test dictionary.

JingqingZ commented 3 years ago

sp_test is for unit-test. It is much smaller than 96k.

Please download the vocab model and file from gcloud and use them (as explained in README).

agenius5 commented 3 years ago

Okay, Got it.

Thanks.

agenius5 commented 3 years ago

Hi, @JingqingZ

I had one thing to ask. I'll ask it right here instead of creating a new issue.

So, I initialized the pegasus with fine-tuned checkpoints available on gcloud for AESLC dataset. I started further fine-tuning with a increased dropout of 0.3 and it improved the model's ROUGE scores significantly on test set.

Earlier you said you didn't experimented with higher dropout values. I just wanna know if it's okay to train the model this way i.e. stop the training and resume with changed hyperparameter, in my case it's the dropout value I'll be changing.

JingqingZ commented 3 years ago

Technically, yes, you can.