dmis-lab / biobert

Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
http://doi.org/10.1093/bioinformatics/btz682
Other
1.93k stars 451 forks source link

RE Fine-tuning issue #144

Closed LedaguenelArthur closed 3 years ago

LedaguenelArthur commented 3 years ago

Hi all!

I've been struggling to make the fine-tuning for Relation Extraction work properly and I don't know what I'm doing wrong.

After making all the necessary imports and downloads, I am running the command line :

!python run_re.py --task_name=gad --do_train=true --do_eval=true --do_predict=true --vocab_file=$BIOBERT_DIR/vocab.txt --bert_config_file=$BIOBERT_DIR/bert_config.json --init_checkpoint=$BIOBERT_DIR/model.ckpt-1000000 --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --do_lower_case=false --data_dir=$RE_DIR --output_dir=$OUTPUT_DIR

The eval_results.txt says : eval_accuracy = 0.0 eval_loss = 0.0 global_step = 449 loss = 0.0

How can I get both a 0 accuracy and a 0 loss ?

N.B. : I have been running this on a GPU and not a TPU but I don't see how that could be the source of the problem (the script clearly states that it should work fine without a TPU).

Thanks a lot in advance for your answers !!

[EDIT]

After further investigations I realized that the file used for the evaluation in run_re.py is set as the dev.tsv file, which is empty in the dataset that I downloaded (the GAD dataset on this repo).

So the question becomes :

Why are my dev.tsv files in GAD dataset empty ?

[EDIT]

Arthur Ledaguenel

wonjininfo commented 3 years ago

Hi Arthur Ledaguenel, My apologies for the delayed reply. For RE datasets, we reported scores using 10-fold cross-validation due to the small size of the datasets. That is the reason for the empty dev.tsv file. However, our performance is not evaluated based on eval_results.txt. It should be evaluated using biocodes/re_eval.py script as we denoted in README. https://github.com/dmis-lab/biobert#relation-extraction-re

Thank you and sorry for the delayed reply, Best, Wonjin