Open phamquang-hieu opened 9 months ago
Hi,
Interesting questions!
preprocess_cnndm
process each sample independently and the pytorch train_dataset.map
function handle that.Thank you for your timely responses!!
What I meant in question 1 was that the relative position between the reference [Document] and the [Incomplete Ref. Summary] seems to be opposite to what presented in the paper. The current order of them, according to the implementation is: [Incomplete Ref. Summary] + [Random Words] + [Document], whereas the one presented in the paper is: [Document] + [Random Words] + [Incomplete Ref. Summary]. Am I correct?
Sorry for misunderstanding. You are right, the one that we implemented is [Incomplete Ref. Summary] + [Random Words] + [Document]. In the paper, we use different terminology to match the figure. I would also imagine that both would result in the same performance (suppose you handle long documents)
Hello there, Thank you for your interesting work! I wonder if you could help me answering some of the following questions:
As stated in the paper, the input format should be: [Document] + [Random Words] + [Incomplete Ref. Summary]. However In the implementation in the file "train_seq2seq.py", line 595 and 596: Phrase2= summary_half + ' ' + " + ".join(Phrase_random_comb) + ' ' example_dic['doc'] = Phrase2 + example where "example" is set as example = example_dic['doc'] on line 512. Could you help me to clarify if this was a mismatch?
It was claimed both in the paper and in your README file that "NonFactS generates grammatically correct nonfactual summaries", did you verify or enforce this condition by any method?
Thank you for taking time considering my questions!