WissingChen / VLCI

Visual-Linguistic Causal Intervention for Radiology Report Generation
45 stars 8 forks source link

CE metric, cheXpert or cheXbert? #9

Closed fengjiejiejiejie closed 6 months ago

fengjiejiejiejie commented 9 months ago

Hi, Weixing, I extend my gratitude for your generosity in sharing the open-source code. However, I have encountered challenges in replicating the clinical metrics (i.e., F1) outlined in your paper using the provided checkpoint on the MIMIC-CXR dataset.

In the process of computing CE metrics, I felt that I might have gone wrong at one step or another.

To elaborate, when utilizing your pretrained VLCI model on the MIMIC-CXR dataset, our obtained NLP and clinical metrics are as follows: BLEU4: 0.113 METEOR: 0.144 ROUGE_L: 0.276 CIDEr: 0.174

Precision: 0.314 Recall: 0.181 F1: 0.179

Here are a few key points:

  1. I use cheXbert to extract entities from gt and pred reports. (I will use cheXpert to extract and compute CE metrics later.)
  2. I use compute_ce.py from R2Gen (https://github.com/zhjohnchan/R2Gen/blob/main/compute_ce.py) to compute CE metrics. The extracted entity csv files of gt and pred are attached.

labeled_reports_gts.csv labeled_reports_res.csv

I eagerly await your insights on this matter. Best

fengjiejiejiejie commented 9 months ago

CE metrics computed by the cheXpert are as follows: Precision: 0.324 Recall: 0.199 F1: 0.196

I guess I might have gone wrong at one step or another.

WissingChen commented 9 months ago

We adopt the cheXpert labeler for CE metric, and i will check the model and result soon.

fengjiejiejiejie commented 9 months ago

We adopt the cheXpert labeler for CE metric, and i will check the model and result soon.

Thanks for your reply. I just used the given checkpoint for inference. The attached file contains the predicted report. Perhaps you could review it. I suspect I may have made an error in calculating the CE index.

fengjiejiejiejie commented 9 months ago

Hi, Have you finished the check? I'm eager to know what is wrong with me?

fengjiejiejiejie commented 9 months ago

Hi, Sorry to bother you again.

labeled_reports_vlci.csv labeled_reports_vlci_gt.csv

The above attach files are pred report and gt report with the given code and checkpoint (also with the labels extracted by cheXpert) . The CE metrics are as follows:

'F1_MACRO': 0.1964923716414105, 'F1_MICRO': 0.3627544833748166, 'PRECISION_MACRO': 0.3236966163372005, 'PRECISION_MICRO': 0.45363128491620114, 'RECALL_MACRO': 0.19916758254963002, 'RECALL_MICRO': 0.30221182475542324

I want to konw 1. Whether the predicted report is correct and same as yours; and 2. Whether the extracted clinincal labels are same as yours.

I eagerly await your insights on this matter. Best

WissingChen commented 9 months ago

It is different, but I still don`t know why 😂

fengjiejiejiejie @.***> 于2024年1月8日周一 19:40写道:

Hi, Sorry to bother you again.

labeled_reports_vlci.csv https://github.com/WissingChen/VLCI/files/13859438/labeled_reports_vlci.csv labeled_reports_vlci_gt.csv https://github.com/WissingChen/VLCI/files/13859441/labeled_reports_vlci_gt.csv

The above attach files are pred report and gt report with the given code and checkpoint (also with the labels extracted by cheXpert) . The CE metrics are as follows:

'F1_MACRO': 0.1964923716414105, 'F1_MICRO': 0.3627544833748166, 'PRECISION_MACRO': 0.3236966163372005, 'PRECISION_MICRO': 0.45363128491620114, 'RECALL_MACRO': 0.19916758254963002, 'RECALL_MICRO': 0.30221182475542324

I want to konw 1. Whether the predicted report is correct and same as yours; and 2. Whether the extracted clinincal labels are same as yours.

I eagerly await your insights on this matter. Best

— Reply to this email directly, view it on GitHub https://github.com/WissingChen/VLCI/issues/9#issuecomment-1880840986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQFUTN3X5DPTE5CVVMKXS3YNPLMDAVCNFSM6AAAAABBOBWQHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQHA2DAOJYGY . You are receiving this because you commented.Message ID: @.***>

fengjiejiejiejie commented 9 months ago

It is different, but I still don`t know why 😂 fengjiejiejiejie @.> 于2024年1月8日周一 19:40写道: Hi, Sorry to bother you again. labeled_reports_vlci.csv https://github.com/WissingChen/VLCI/files/13859438/labeled_reports_vlci.csv labeled_reports_vlci_gt.csv https://github.com/WissingChen/VLCI/files/13859441/labeled_reports_vlci_gt.csv The above attach files are pred report and gt report with the given code and checkpoint (also with the labels extracted by cheXpert) . The CE metrics are as follows: 'F1_MACRO': 0.1964923716414105, 'F1_MICRO': 0.3627544833748166, 'PRECISION_MACRO': 0.3236966163372005, 'PRECISION_MICRO': 0.45363128491620114, 'RECALL_MACRO': 0.19916758254963002, 'RECALL_MICRO': 0.30221182475542324 I want to konw 1. Whether the predicted report is correct and same as yours; and 2. Whether the extracted clinincal labels are same as yours. I eagerly await your insights on this matter. Best — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQFUTN3X5DPTE5CVVMKXS3YNPLMDAVCNFSM6AAAAABBOBWQHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQHA2DAOJYGY . You are receiving this because you commented.Message ID: @.>

So, our predicted reports are same, but extracted labels from chexpert are different? Can you give me your predicted report with labels extracted from chexpert? So that I can check the results.