evaluate summaries always runs on both extrinsic datasets (data_subset is no longer a parameter).
rouge scores script always compute rouge scores for the full set.
I didn't run compute_rouge_scores on the full set yet since i don't have enough bandwidth to download the full logs from paperspace. Will do once I'm back home. I did verify that the rouge score script works by testing on a smaller subset of the data.
Simplification:
I didn't run compute_rouge_scores on the full set yet since i don't have enough bandwidth to download the full logs from paperspace. Will do once I'm back home. I did verify that the rouge score script works by testing on a smaller subset of the data.