allenai / scibert

A BERT model for scientific text.
https://arxiv.org/abs/1903.10676
Apache License 2.0
1.52k stars 217 forks source link

Save predictions on test data if test is enabled #53

Open kamalravi opened 5 years ago

kamalravi commented 5 years ago

Hi SciBERT team, First of all, it is an awesome work. I have a question cum feature request.

Q1. Is scibert predicting the entity for each token or for a sentence (bunch of tokens) from the test set?

Q2. Is the accuracy/f1 score calculated at the "span label" level (PER) or "IOB label + span label" level (B-PER)?

If we have a feature to save the predictions (in addition to outputting just the acc/f1 score) on the test data when if the test is enabled during training, we can figure out the above such questions ourselves.

Thank you

ibeltagy commented 5 years ago

1- The NER model predicts an IOB label per token in the sentence, which can be used at decoding time to find spans of entities 2- We use span-based f1 (f1-measure-overall) (here https://github.com/allenai/scibert/blob/master/allennlp_config/ner.json) which is this allennlp metric https://github.com/allenai/allennlp/blob/master/allennlp/training/metrics/span_based_f1_measure.py

You need to implement an allennlp predictor to get predictions from the trained models