I'm confused how to evaluate.
Should I regard the whole paragraph (multi-sentences) as a large sentence and regard the ground truth as a sentence, either? Then put them into bleu, cider (and so on) to evaluate?
Or should I change the code of bleu.py and cider.py to evaluate the paragraphs by one sentence (generated) matching one sentence (ground truth)?
Hope you can help me with this! Thank you!
I'm confused how to evaluate. Should I regard the whole paragraph (multi-sentences) as a large sentence and regard the ground truth as a sentence, either? Then put them into bleu, cider (and so on) to evaluate? Or should I change the code of bleu.py and cider.py to evaluate the paragraphs by one sentence (generated) matching one sentence (ground truth)? Hope you can help me with this! Thank you!