KaijuML / rotowire-rg-metric

Code for the RG metric of Challenges in Data-to-Document Generation (Wiseman, Shieber, Rush; EMNLP 2017)
7 stars 2 forks source link

Strange output tuples #3

Open yunhaoli1995 opened 3 years ago

yunhaoli1995 commented 3 years ago

Hi, I clone the newest code, download the models and run the extract script. The prec is 0.8846444487571716 but the output tuples prep_predictions.h5-tuples.txt seems to be very strange:

The|(|TEAM-WINS
The|-|TEAM-LOSSES
in|-|TEAM-LOSSES
the|(|TEAM-WINS
the|-|TEAM-LOSSES
The|)|TEAM-PTS
in|)|TEAM-PTS
the|-|TEAM-PTS
The|Magic|TEAM-PTS_QTR1
the|-|TEAM-PTS_QTR4
The|going|TEAM-FG_PCT
The|and|TEAM-FG3_PCT
the|went|TEAM-FG_PCT
the|just|TEAM-FG3_PCT
the|into|TEAM-FG_PCT
The|and|TEAM-FG3_PCT
the|and|TEAM-FG3_PCT
the|just|TEAM-FG_PCT
|of|PLAYER-PTS
|(|PLAYER-FGM
|-|PLAYER-FGA

Do you have the same problem?

yunhaoli1995 commented 3 years ago

It seems that the value of min_entdist and min_numdist in the Inference class are wrongly initialized. I solve this issue by manually set

inference.min_entdist = -90
inference.min_numdist = -95
KaijuML commented 3 years ago

Hi,

I also have the same issue, and I "know" where it comes from: during the fix for the previous issue, I have added +1 during data preparation, because it was something done in the original code that I had missed at first.

See lines 91, 92 and 94 of the original code.

However, to be honest, I am not entirely sure why these +1 are there, and the code seemed to be working fine before I added them.

Would you have time to remove them from your local copy of this code (see lines 97 and 98) and see if training a model + inference works correctly?

I don't have time right now to investigate properly, I am sorry.

Let me know if this works and I'll update the code (and retrain models once more!)

Thanks for your collaboration & efforts, Clément

yunhaoli1995 commented 3 years ago

Hi,

I also have the same issue, and I "know" where it comes from: during the fix for the previous issue, I have added +1 during data preparation, because it was something done in the original code that I had missed at first.

See lines 91, 92 and 94 of the original code.

However, to be honest, I am not entirely sure why these +1 are there, and the code seemed to be working fine before I added them.

Would you have time to remove them from your local copy of this code (see lines 97 and 98) and see if training a model + inference works correctly?

I don't have time right now to investigate properly, I am sorry.

Let me know if this works and I'll update the code (and retrain models once more!)

Thanks for your collaboration & efforts, Clément

Hi, I remove + 1 at:

self.entdists.add_(-min_entdist + 1)
self.numdists.add_(-min_numdist + 1)

Then evaluate with run.py, but the number of extracted tuples decreased largely due to this operation. I think +1 is necessary for this process.

yunhaoli1995 commented 3 years ago

Hi, I also have the same issue, and I "know" where it comes from: during the fix for the previous issue, I have added +1 during data preparation, because it was something done in the original code that I had missed at first. See lines 91, 92 and 94 of the original code. However, to be honest, I am not entirely sure why these +1 are there, and the code seemed to be working fine before I added them. Would you have time to remove them from your local copy of this code (see lines 97 and 98) and see if training a model + inference works correctly? I don't have time right now to investigate properly, I am sorry. Let me know if this works and I'll update the code (and retrain models once more!) Thanks for your collaboration & efforts, Clément

Hi, I remove + 1 at:

self.entdists.add_(-min_entdist + 1)
self.numdists.add_(-min_numdist + 1)

Then evaluate with run.py, but the number of extracted tuples decreased largely due to this operation. I think +1 is necessary for this process.

This error occurs when I inference with the pretrained models, now I'm training my own models, no errors occurs by now, I will let you know if it works to inference with the newly trained models.

yunhaoli1995 commented 3 years ago

Hi,

I also have the same issue, and I "know" where it comes from: during the fix for the previous issue, I have added +1 during data preparation, because it was something done in the original code that I had missed at first.

See lines 91, 92 and 94 of the original code.

However, to be honest, I am not entirely sure why these +1 are there, and the code seemed to be working fine before I added them.

Would you have time to remove them from your local copy of this code (see lines 97 and 98) and see if training a model + inference works correctly?

I don't have time right now to investigate properly, I am sorry.

Let me know if this works and I'll update the code (and retrain models once more!)

Thanks for your collaboration & efforts, Clément

Hi, I remove the +1, train 10 lstms and select 5 best modes, then inference with the trained models. The prec is 0.9286 and the extracted tuples are right:

Atlanta Hawks|46|TEAM-WINS
Atlanta Hawks|12|TEAM-LOSSES
Orlando Magic|19|TEAM-WINS
Orlando Magic|41|TEAM-LOSSES
Atlanta Hawks|95|TEAM-PTS
Orlando Magic|88|TEAM-PTS
Al Horford|17|PLAYER-PTS
Al Horford|13|PLAYER-REB
Al Horford|four|PLAYER-AST
Al Horford|two|PLAYER-STL
Jeff Teague|17|PLAYER-PTS
Jeff Teague|seven|PLAYER-AST
Jeff Teague|two|PLAYER-STL
Vucevic|21|PLAYER-PTS
Vucevic|15|PLAYER-REB

It seems everything is fine on my side!

soda-lsq commented 3 years ago

Hi,

I encounter the same issue of strange output tuples as mentioned above. I follow the instructions to remove '+1' in data.py and retrain the model. As instructed, I trained 10 BiLSTM models and 10 CNN models, and respectively selected top-3 models for the ensemble. The training and inference process worked well. However, I am confused about the output result:

[2021-07-26 14:35:09,115 INFO] prec 0.9989385008811951 [2021-07-26 14:35:09,115 INFO] nodup_prec 0.9989374876022339 [2021-07-26 14:35:09,115 INFO] total correct 39523.0 [2021-07-26 14:35:09,115 INFO] nodup correct 39488.0

The output result seems very high, so I wonder the correct method to calculate 'RG P% and RG #'?

Is 'RG P%' equals to 'prec' or 'nodup_prec' ? And is 'RG #' equals to the number of 'nodup correct' divide 'total correct' (the output result is nearly 100, but about 50 in Wiseman's paper.)

Could you help me to figure out this issue and I am really grateful for your help.

Shuqi

yunhaoli1995 commented 3 years ago

Hi,

I encounter the same issue of strange output tuples as mentioned above. I follow the instructions to remove '+1' in data.py and retrain the model. As instructed, I trained 10 BiLSTM models and 10 CNN models, and respectively selected top-3 models for the ensemble. The training and inference process worked well. However, I am confused about the output result:

[2021-07-26 14:35:09,115 INFO] prec 0.9989385008811951 [2021-07-26 14:35:09,115 INFO] nodup_prec 0.9989374876022339 [2021-07-26 14:35:09,115 INFO] total correct 39523.0 [2021-07-26 14:35:09,115 INFO] nodup correct 39488.0

The output result seems very high, so I wonder the correct method to calculate 'RG P% and RG #'?

Is 'RG P%' equals to 'prec' or 'nodup_prec' ? And is 'RG #' equals to the number of 'nodup correct' divide 'total correct' (the output result is nearly 100, but about 50 in Wiseman's paper.)

Could you help me to figure out this issue and I am really grateful for your help.

Shuqi

Did you evaluate the model on the training set? I trained 10 lstm mdoels and select best 5, then evaluated the GT summary of test set and nodup_prec is 0.9286.

soda-lsq commented 3 years ago

Hi, I encounter the same issue of strange output tuples as mentioned above. I follow the instructions to remove '+1' in data.py and retrain the model. As instructed, I trained 10 BiLSTM models and 10 CNN models, and respectively selected top-3 models for the ensemble. The training and inference process worked well. However, I am confused about the output result: [2021-07-26 14:35:09,115 INFO] prec 0.9989385008811951 [2021-07-26 14:35:09,115 INFO] nodup_prec 0.9989374876022339 [2021-07-26 14:35:09,115 INFO] total correct 39523.0 [2021-07-26 14:35:09,115 INFO] nodup correct 39488.0 The output result seems very high, so I wonder the correct method to calculate 'RG P% and RG #'? Is 'RG P%' equals to 'prec' or 'nodup_prec' ? And is 'RG #' equals to the number of 'nodup correct' divide 'total correct' (the output result is nearly 100, but about 50 in Wiseman's paper.) Could you help me to figure out this issue and I am really grateful for your help. Shuqi

Did you evaluate the model on the training set? I trained 10 lstm mdoels and select best 5, then evaluated the GT summary of test set and nodup_prec is 0.9286.

Hi Yunhao,

Really thanks for your quick reply. I evaluated the model on the training and test dataset just now and here is the result:

The result is so high that seems kind of weird. Do you know how to compute the RG metric of RG P% and RG # according to the output?

Thanks for your help!

Shuqi

yunhaoli1995 commented 3 years ago

Hi, I encounter the same issue of strange output tuples as mentioned above. I follow the instructions to remove '+1' in data.py and retrain the model. As instructed, I trained 10 BiLSTM models and 10 CNN models, and respectively selected top-3 models for the ensemble. The training and inference process worked well. However, I am confused about the output result: [2021-07-26 14:35:09,115 INFO] prec 0.9989385008811951 [2021-07-26 14:35:09,115 INFO] nodup_prec 0.9989374876022339 [2021-07-26 14:35:09,115 INFO] total correct 39523.0 [2021-07-26 14:35:09,115 INFO] nodup correct 39488.0 The output result seems very high, so I wonder the correct method to calculate 'RG P% and RG #'? Is 'RG P%' equals to 'prec' or 'nodup_prec' ? And is 'RG #' equals to the number of 'nodup correct' divide 'total correct' (the output result is nearly 100, but about 50 in Wiseman's paper.) Could you help me to figure out this issue and I am really grateful for your help. Shuqi

Did you evaluate the model on the training set? I trained 10 lstm mdoels and select best 5, then evaluated the GT summary of test set and nodup_prec is 0.9286.

Hi Yunhao,

Really thanks for your quick reply. I evaluated the model on the training and test dataset just now and here is the result:

  • train prec = 0.9993, train nodup_prec = 0.9993, train nodup correct = 184823, train total correct = 185025
  • valid prec = 0.9989, valid nodup_prec = 0.9989, valid nodup correct = 39504, valid total correct = 39544
  • test prec = 0.9989, test nodup_prec = 0.9989, test nodup correct = 39488, test total correct = 39523

The result is so high that seems kind of weird. Do you know how to compute the RG metric of RG P% and RG # according to the output?

Thanks for your help!

Shuqi

Hi Shuqi,

RG P% is computed by #correct/#pred, where #pred is the number of valid relations predicted by the model, and #correct is the number of correct relation among the valid relations. For more details, you can refer to https://github.com/KaijuML/rotowire-rg-metric/blob/5c5018cb1cb3feb584a0cf846b2e9f73b7db989b/inference.py#L9.

By the way, I evaluated the model on the training set, the output is:

2021-07-26 16:32:33,648 INFO] prec 0.956261157989502
[2021-07-26 16:32:33,648 INFO] nodup_prec 0.9562265276908875
[2021-07-26 16:32:33,649 INFO] total correct 78991.0
[2021-07-26 16:32:33,649 INFO] nodup correct 78729.0
Jeremiah0425 commented 3 years ago

It seems that the value of min_entdist and min_numdist in the Inference class are wrongly initialized. I solve this issue by manually set

inference.min_entdist = -90
inference.min_numdist = -95

hi,can you tell me how to use the model to generate a summary.