Algorithm generates almost the same reports for any image

RaySkarken commented 4 months ago

Hi @LX-doctorAI1 , Thank you for sharing this code.

I trained v12 on iu x rays and result was following: almost all the generated reports were identical. After removing reports labeled "No finding" from dataset I could achieve 14 unique reports on test split. Also I tried same with mimic xr, and again almost all reports were identical. Even so BLEU metric was good enough.

Original IU Xray:

********************Best results********************
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in validation set:
03-02 23:54:40-INFO:VAL ||| Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.481 |||BLEU_2: 0.3243 |||BLEU_3: 0.2303 |||BLEU_4: 0.1673 |||CIDEr: 0.3285 |||ROUGE_L: 0.3305
03-02 23:54:40-INFO:TEST || Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.4557 |||BLEU_2: 0.3015 |||BLEU_3: 0.2074 |||BLEU_4: 0.1444 |||CIDEr: 0.2619 |||ROUGE_L: 0.3156
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in test set:
03-02 23:54:40-INFO:VAL ||| Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4555 |||BLEU_2: 0.29 |||BLEU_3: 0.1992 |||BLEU_4: 0.1475 |||CIDEr: 0.282 |||ROUGE_L: 0.3399
03-02 23:54:40-INFO:TEST || Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4631 |||BLEU_2: 0.2991 |||BLEU_3: 0.2086 |||BLEU_4: 0.157 |||CIDEr: 0.2829 |||ROUGE_L: 0.3454

Without 'No Finding':

********************Best results********************
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in validation set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in test set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329

I guess problem is imbalance of classes. I will be grateful for any advices on how to solve the problem with identical reports!

Lalalalala-l commented 4 months ago

Hi @LX-doctorAI1 , Thank you for sharing this code.

I trained v12 on iu x rays and result was following: almost all the generated reports were identical. After removing reports labeled "No finding" from dataset I could achieve 14 unique reports on test split. Also I tried same with mimic xr, and again almost all reports were identical. Even so BLEU metric was good enough.

Original IU Xray:

********************Best results********************
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in validation set:
03-02 23:54:40-INFO:VAL ||| Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.481 |||BLEU_2: 0.3243 |||BLEU_3: 0.2303 |||BLEU_4: 0.1673 |||CIDEr: 0.3285 |||ROUGE_L: 0.3305
03-02 23:54:40-INFO:TEST || Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.4557 |||BLEU_2: 0.3015 |||BLEU_3: 0.2074 |||BLEU_4: 0.1444 |||CIDEr: 0.2619 |||ROUGE_L: 0.3156
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in test set:
03-02 23:54:40-INFO:VAL ||| Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4555 |||BLEU_2: 0.29 |||BLEU_3: 0.1992 |||BLEU_4: 0.1475 |||CIDEr: 0.282 |||ROUGE_L: 0.3399
03-02 23:54:40-INFO:TEST || Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4631 |||BLEU_2: 0.2991 |||BLEU_3: 0.2086 |||BLEU_4: 0.157 |||CIDEr: 0.2829 |||ROUGE_L: 0.3454

Without 'No Finding':

********************Best results********************
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in validation set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in test set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329

I guess problem is imbalance of classes. I will be grateful for any advices on how to solve the problem with identical reports!

Hello,can you provide me the file "id_label.csv"in iu.yml？please！

px2020zjut commented 4 months ago

Hi @LX-doctorAI1 , Thank you for sharing this code.

I trained v12 on iu x rays and result was following: almost all the generated reports were identical. After removing reports labeled "No finding" from dataset I could achieve 14 unique reports on test split. Also I tried same with mimic xr, and again almost all reports were identical. Even so BLEU metric was good enough.

Original IU Xray:

********************Best results********************
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in validation set:
03-02 23:54:40-INFO:VAL ||| Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.481 |||BLEU_2: 0.3243 |||BLEU_3: 0.2303 |||BLEU_4: 0.1673 |||CIDEr: 0.3285 |||ROUGE_L: 0.3305
03-02 23:54:40-INFO:TEST || Epoch: 28|||train_loss: 1.134||| BLEU_1: 0.4557 |||BLEU_2: 0.3015 |||BLEU_3: 0.2074 |||BLEU_4: 0.1444 |||CIDEr: 0.2619 |||ROUGE_L: 0.3156
03-02 23:54:40-INFO:Best results (w.r.t BLEU_4) in test set:
03-02 23:54:40-INFO:VAL ||| Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4555 |||BLEU_2: 0.29 |||BLEU_3: 0.1992 |||BLEU_4: 0.1475 |||CIDEr: 0.282 |||ROUGE_L: 0.3399
03-02 23:54:40-INFO:TEST || Epoch: 18|||train_loss: 1.58||| BLEU_1: 0.4631 |||BLEU_2: 0.2991 |||BLEU_3: 0.2086 |||BLEU_4: 0.157 |||CIDEr: 0.2829 |||ROUGE_L: 0.3454

Without 'No Finding':

********************Best results********************
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in validation set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329
03-09 10:26:07-INFO:Best results (w.r.t BLEU_4) in test set:
03-09 10:26:07-INFO:VAL ||| Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4556 |||BLEU_2: 0.2982 |||BLEU_3: 0.2108 |||BLEU_4: 0.1601 |||CIDEr: 0.303 |||ROUGE_L: 0.3355
03-09 10:26:07-INFO:TEST || Epoch: 7|||train_loss: 2.698||| BLEU_1: 0.4472 |||BLEU_2: 0.2903 |||BLEU_3: 0.2044 |||BLEU_4: 0.1539 |||CIDEr: 0.1982 |||ROUGE_L: 0.329

I guess problem is imbalance of classes. I will be grateful for any advices on how to solve the problem with identical reports!

I have the same issue. the generated reports is almost same. Does anyone reproduce the results shown in Fig. 3.

LX-doctorAI1 / M2KT

Algorithm generates almost the same reports for any image #7