Markin-Wang / XProNet

[ECCV2022] The official implementation of Cross-modal Prototype Driven Network for Radiology Report Generation
Apache License 2.0
66 stars 9 forks source link

IUXray: train test validation split affecting token/id mappings #7

Open schaefer30038 opened 2 years ago

schaefer30038 commented 2 years ago

Hello, I've attempted to reproduce the results seen in the report however after using your weights, I've been getting a shape error that is due to the size of the word mappings (token_to_id and id_to_token). I've made my own annotations.json file to split the data (70-10-20 as indicated in the paper) however the random split is causing a difference in how many words get embedded into meaningful tokens and how many are embedded as . May I ask how you split the data and if it's possible to have access to the original annotations.json file?

Markin-Wang commented 2 years ago

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Liqq1 commented 1 year ago

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134

image

the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Markin-Wang commented 1 year ago

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image

the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Liqq1 commented 1 year ago

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Thank you very much. With your reply, I can completely understand. As the author of R2Gen may not have focused on issues of his project, I was unable to contact him and confirm. All in all, thank you again!

Liqq1 commented 1 year ago

Hi, thanks for your interest. As for the data split, we follow the same data split as R2Gen and R2GenCMN to randomly split the data (70-10-20). I have updated the annotation file in files/iu_xray directory in this repo and hope this can help you successfully reproduce the results.

Hi, I'm sorry to bother you. May I ask you some questions about the code in R2Gen? In the https://github.com/cuhksz-nlp/R2Gen/blob/5a2e63f6d8da670725b1926317b4bfd1393ec51b/modules/encoder_decoder.py#L134 image the return is x + self.dropout(sublayer(self.norm(x, memory))) Is this code correct? I don't think it corresponds to the diagram in the paper. I think it should be self.norm((x + self.dropout(sublayer(x))), memory) Can you help me to distinguish? thanks a lot

Hi, generally speaking you are right, the vanilla transformer uses the post-LN as what you described. However, as in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.

Hope this can help you figure out the problem.

Maybe this is a modification by the R2Gen authors to make the training more stable possibly according to their experiments.

I am not sure and you may need to discuss this with the R2Gen authors to gain the details.

Hi! I'm sorry to bother you again! I hope to get your help.

  1. Have you tried to reproduce the results using the weights provided by the author? I tried this, but the results(On IU X-ray) I got were not the same as those shown in the paper, which is higher than the paper. 32671677915599_ pic

  2. Results on data set MIMIC-CXR. Why are the results in the following two tables different? The author mentioned "the number of memory slots is set to 3 by default" in the paper, but the results in Table 1 are different from the third row in Table 2, do you know why? 78a6ffa93affabd391d0a3110576e306 961ca76bab5895ca7412b593206ddf78 thanks again!