Closed jinfengr closed 3 years ago
Hi! Sorry for the long delay.
1/4/5. Yes, it's 300. You can check out the preprocessing script here. The vocab is shared and is of size 5000, without filtering out low-freq tokens (check out the default parameters in this file). The vocab embeddings are not shared.
2/3. We use a randomly-initialized embedding layer, not pretrained word embeddings, and no positional embeddings.
Hi Alex,
Thanks a lot for sharing the code and data. I am trying to evaluate on your dataset. however, there are some details which are not mentioned in the paper. I am wondering if you could provide answers to the following questions: