Closed LedaguenelArthur closed 3 years ago
Hi Arthur Ledaguenel,
My apologies for the delayed reply. For RE datasets, we reported scores using 10-fold cross-validation due to the small size of the datasets. That is the reason for the empty dev.tsv file.
However, our performance is not evaluated based on eval_results.txt. It should be evaluated using biocodes/re_eval.py
script as we denoted in README.
https://github.com/dmis-lab/biobert#relation-extraction-re
Thank you and sorry for the delayed reply, Best, Wonjin
Hi all!
I've been struggling to make the fine-tuning for Relation Extraction work properly and I don't know what I'm doing wrong.
After making all the necessary imports and downloads, I am running the command line :
!python run_re.py --task_name=gad --do_train=true --do_eval=true --do_predict=true --vocab_file=$BIOBERT_DIR/vocab.txt --bert_config_file=$BIOBERT_DIR/bert_config.json --init_checkpoint=$BIOBERT_DIR/model.ckpt-1000000 --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --do_lower_case=false --data_dir=$RE_DIR --output_dir=$OUTPUT_DIR
The eval_results.txt says : eval_accuracy = 0.0 eval_loss = 0.0 global_step = 449 loss = 0.0
How can I get both a 0 accuracy and a 0 loss ?
N.B. : I have been running this on a GPU and not a TPU but I don't see how that could be the source of the problem (the script clearly states that it should work fine without a TPU).
Thanks a lot in advance for your answers !!
[EDIT]
After further investigations I realized that the file used for the evaluation in run_re.py is set as the dev.tsv file, which is empty in the dataset that I downloaded (the GAD dataset on this repo).
So the question becomes :
Why are my dev.tsv files in GAD dataset empty ?
[EDIT]
Arthur Ledaguenel