The result of step2 is 0

WilliamAntoniocrayon commented 1 year ago

Hi,there! While I run step2, I got a result of 0 for F1. It feels like something went wrong, but forgive me for not finding the problem for the time being, here is the code running result. Looking forward to your response.

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading examples: 100%|| 101873/101873 [23:39<00:00, 71.75it/s] of documents 101873. of positive examples 1340774. of negative examples 37784196. Evaluating batches: 100%| 25469/25469 [24:52<00:00, 17.07it/s] preds: 100%|| 39124970/39124970 [08:15<00:00, 79027.51it/s] saving attentions into /home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/train_distant.attns ... saving official predictions into /home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/results.json ... saving evaluations into /home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/train_distant_scores.csv ... precision recall F1 infer_rel 0.0 0.0 0 infer_rel_ign 0.0 0.0 0 infer_evi 0.0 0.0 0 saving topk results into /home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/topk_results.json ...

YoumiMa commented 1 year ago

Hi @WilliamAntoniocrayon, thank you for your interest in this project!

Diagnosing the cause only from the running log doesn't seem easy. I can see the prediction process ends fast (the process named preds lasts for ~1 hour on my machine), which is strange and may indicate that the model is not well trained during step 1.

For reproducibility, could you please provide the following information?

The script you used for training DREEAM in step 1;
The evaluation results returned from step 1;
The script you used for inference in step 2.

WilliamAntoniocrayon commented 1 year ago

Thank you for replying in your busy schedule, below is my command to execute step1 and the script of DREEAM on RTX3090 .

command ： bash scripts/run_bert.sh step1 0.1 234

TYPE=$1
LAMBDA=$2
SEED=$3

NAME=${TYPE}_lambda${LAMBDA}

python run.py --do_train \
--data_dir dataset/docred \
--transformer_type bert \
--model_name_or_path bert-base-cased \
--display_name ${NAME} \
--save_path ${NAME} \
--train_file train_annotated.json \
--dev_file dev.json \
--train_batch_size 4 \
--test_batch_size 8 \
--gradient_accumulation_steps 1 \
--num_labels 4 \
--lr_transformer 5e-5 \
--max_grad_norm 1.0 \
--evi_thresh 0.2 \
--evi_lambda ${LAMBDA} \
--warmup_ratio 0.06 \
--num_train_epochs 30.0 \
--seed ${SEED} \
--num_class 97

Step 1 Result

Untitled

WilliamAntoniocrayon commented 1 year ago

The following is the script used for inference in step 2.

best.ckpt is stored in this file(2023-06-20_07:21:57.116663).

command ： bash scripts/infer_distant_bert.sh step2 '/home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/’


NAME=$1
LOAD_DIR=$2

python run.py --data_dir dataset/docred \
--transformer_type bert \
--model_name_or_path bert-base-cased \
--display_name  ${NAME} \
--load_path ${LOAD_DIR} \
--eval_mode single \
--test_file train_distant.json \
--test_batch_size 4 \
--evi_thresh 0.2 \
--num_labels 4 \
--num_class 97 \
--save_attn \

YoumiMa commented 1 year ago

Hi @WilliamAntoniocrayon, thanks for the information.

We realized the score returned zero due to a bug (the predictions were made on train_distant.json, but the evaluations were made on dev.json).

The bug should have been fixed in the latest commit 7ce1565. Pulling down the newest code and running the same command for step 2 as you mentioned above should return a reasonable score (~45 F1) this time. BTW, even with the code before this commit, although the evaluation scores were 0, the saved attention weights train_distant.attns should also be valid for further use.

I hope this is clear and helps you solve the issue.

Thank you for reporting the bug!

WilliamAntoniocrayon commented 1 year ago

Sorry for not running the changed code again in time because there was a problem with the server. Now I have got the step2 result. Thank you very much for your help.

YoumiMa / dreeam

The result of step2 is 0 #13

Step 1 Result