google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

Precision equals Recall in run_classifier.py script run. #13

Open Amit-GH opened 3 years ago

Amit-GH commented 3 years ago

I am trying to replicate the results of the paper. I ran run_classifier.py script for 7000 train-steps on imdb reviews. After every 1000 batches, we see precision, recall, accuracy, F1 score and loss printed on the terminal. For all the checkpoints, precision=recall=F1=accuracy up to all decimal points. I wonder if this has some mistake in calculation. For a binary dataset, we should not have precision=recall=accuracy.

For e.g. for ckpt-1000, I got 0.9408210 as the values for p, r, a, f1.

Yuqi92 commented 3 years ago

Hi, I think they used prec@k instead of precision: https://github.com/google-research/bigbird/blob/db06498ec8804c6438111938d8654b66ddaccd5d/bigbird/classifier/run_classifier.py#L282-L283

Here are the following official docs for these two: Precision: https://www.tensorflow.org/api_docs/python/tf/compat/v1/metrics/precision P@k: https://www.tensorflow.org/api_docs/python/tf/compat/v1/metrics/precision_at_k

Precision is the traditional precision metric we used. I personally tried the tf.compat.v1.metrics.precision and it worked. Here is the sample code:

precision, precision_op = tf.compat.v1.metrics.precision(
            labels=label_ids, predictions=predictions, weights=None, name="precision")

The same with Recall. Finally I got precision of 0.9483 and recall of 0.9606 with training step of 2,000.

Hope this helps.