Alex-Fabbri / Multi-News

Large-scale multi-document summarization dataset and code
Other
276 stars 53 forks source link

Details about experimental setup #26

Closed jinfengr closed 3 years ago

jinfengr commented 4 years ago

Hi Alex,

Thanks a lot for sharing the code and data. I am trying to evaluate on your dataset. however, there are some details which are not mentioned in the paper. I am wondering if you could provide answers to the following questions:

  1. What's the maximum truncate length of summary (looks like it's 300)?
  2. Which embedding are you using, or do you use pretrained word embeddings?
  3. Do you use positional embedding?
  4. Do you share the encoder and decoder vocab and vocab embedding?
  5. What's the encoder/decoder vocab size (or do you use a minimal frequency to filter out low-freq words or tokens)?
Alex-Fabbri commented 3 years ago

Hi! Sorry for the long delay.

1/4/5. Yes, it's 300. You can check out the preprocessing script here. The vocab is shared and is of size 5000, without filtering out low-freq tokens (check out the default parameters in this file). The vocab embeddings are not shared.

2/3. We use a randomly-initialized embedding layer, not pretrained word embeddings, and no positional embeddings.