Closed ShibataGenjiro closed 4 years ago
The tagged version should be the final version of the code that was used for the paper. It must be the case that I made some breaking changes after that and I did not fix the old config files (because it would mean retraining everything again). If you did retrain from the master, I think the equivalent loss normalization settings are instance_normalization='sum'
and batch_normalization='average'
.
The code only allows for 1 GPU, so that's all I used (sorry I don't remember which kind). From the original logs, I see that the extractive model training step took about 14 hours. The PGN took about 4 hours, and the additional fine tuning with the coverage loss took about 2 hours
OK, I will try to run the tagged version. Thank you for your quick reply!
Hello.
When I tried to run train.sh in the summary-cloze-master/experiments/deutsch2019/abstractive-step/pointer-generator. by
there was an error.
Then I checked summary-cloze-master\summarize\models\cloze\pointer_generator.py and summary-cloze-master\summarize\models\sds\pointer_generator.py, found that there is no parameter called loss_normalization but has parameters called instance_loss_normalization and batch_loss_normalization.
Then I checked another version of the source code in Tag:emnlp2019. I found that in this version, loss_normalization: str = 'summaries' is in the summarize/models/sds/pointer_generator.py and summarize/models/cloze/pointer_generator.py.
So I want to know which version is your final version? (If the Master Branch is your final version, how can I solve this problem?)
Another question I want to know is: What (and how many) GPUs did you use and how long did it take to train the PGN and PGN with coverage using these GPUs?