Closed WilliamAntoniocrayon closed 1 year ago
Hi @WilliamAntoniocrayon, thank you for your interest in this project!
Diagnosing the cause only from the running log doesn't seem easy. I can see the prediction process ends fast (the process named preds
lasts for ~1 hour on my machine), which is strange and may indicate that the model is not well trained during step 1.
For reproducibility, could you please provide the following information?
Thank you for replying in your busy schedule, below is my command to execute step1 and the script of DREEAM on RTX3090 .
command : bash scripts/run_bert.sh step1 0.1 234
TYPE=$1
LAMBDA=$2
SEED=$3
NAME=${TYPE}_lambda${LAMBDA}
python run.py --do_train \
--data_dir dataset/docred \
--transformer_type bert \
--model_name_or_path bert-base-cased \
--display_name ${NAME} \
--save_path ${NAME} \
--train_file train_annotated.json \
--dev_file dev.json \
--train_batch_size 4 \
--test_batch_size 8 \
--gradient_accumulation_steps 1 \
--num_labels 4 \
--lr_transformer 5e-5 \
--max_grad_norm 1.0 \
--evi_thresh 0.2 \
--evi_lambda ${LAMBDA} \
--warmup_ratio 0.06 \
--num_train_epochs 30.0 \
--seed ${SEED} \
--num_class 97
The following is the script used for inference in step 2.
best.ckpt is stored in this file(2023-06-20_07:21:57.116663).
command : bash scripts/infer_distant_bert.sh step2 '/home/nlp/code/DREEAM/step1_lambda0.1/2023-06-20_07:21:57.116663/’
NAME=$1
LOAD_DIR=$2
python run.py --data_dir dataset/docred \
--transformer_type bert \
--model_name_or_path bert-base-cased \
--display_name ${NAME} \
--load_path ${LOAD_DIR} \
--eval_mode single \
--test_file train_distant.json \
--test_batch_size 4 \
--evi_thresh 0.2 \
--num_labels 4 \
--num_class 97 \
--save_attn \
Hi @WilliamAntoniocrayon, thanks for the information.
We realized the score returned zero due to a bug (the predictions were made on train_distant.json
, but the evaluations were made on dev.json
).
The bug should have been fixed in the latest commit 7ce1565. Pulling down the newest code and running the same command for step 2 as you mentioned above should return a reasonable score (~45 F1) this time. BTW, even with the code before this commit, although the evaluation scores were 0, the saved attention weights train_distant.attns
should also be valid for further use.
I hope this is clear and helps you solve the issue.
Thank you for reporting the bug!
Sorry for not running the changed code again in time because there was a problem with the server. Now I have got the step2 result. Thank you very much for your help.
Hi,there! While I run step2, I got a result of 0 for F1. It feels like something went wrong, but forgive me for not finding the problem for the time being, here is the code running result. Looking forward to your response.