Open schaefer30038 opened 2 years ago
Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.
Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134
the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be
self.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lot
Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.
Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134
the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should beself.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lot
Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.
Hope this can help you figure out the problem.
Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.
I am not sure and you may need to discuss this with the R2Gen authors to gain the details.
Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.
Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should beself.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lotHi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.
Hope this can help you figure out the problem.
Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.
I am not sure and you may need to discuss this with the R2Gen authors to gain the details.
Thank you very much. With your reply, I can completely understand. As the author of R2Gen may not have focused on issues of his project, I was unable to contact him and confirm. All in all, thank you again!
Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.
Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 the return is
x + self.dropout(sublayer(self.norm(x, memory)))
Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should beself.norm((x + self.dropout(sublayer(x))), memory)
Can you help me to distinguish? thanks a lotHi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.
Hope this can help you figure out the problem.
Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.
I am not sure and you may need to discuss this with the R2Gen authors to gain the details.
Hi! I'm sorry to bother you again! I hope to get your help.
Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper.
Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why? thanks again!
Hello, I've attempted to reproduce the results seen in the report however after using your weights, I've been getting a shape error that is due to the size of the word mappings (token_to_id and id_to_token). I've made my own annotations.json file to split the data (70-10-20 as indicated in the paper) however the random split is causing a difference in how many words get embedded into meaningful tokens and how many are embedded as. May I ask how you split the data and if it's possible to have access to the original annotations.json file?