Error while calculating precision, recall and f1 score for CRAFT_V4 dataset

rohitjajee commented 3 years ago

My code: import flair from flair.datasets import biomedical from flair.models import MultiTagger craft = biomedical.CRAFT_V4() hunflair_tagger = MultiTagger.load("hunflair") disease_tagger = hunflair_tagger.name_to_tagger["hunflair-disease"] print(disease_tagger.evaluate(craft.test, 'ner'))

Error: IndexError Traceback (most recent call last)

in ----> 1 print(disease_tagger.evaluate(craft.test, gold_label_type='pos')) /local_disk0/.ephemeral_nfs/envs/pythonEnv-d7c12066-4c17-4ffe-a0d4-89c1b39a60b2/lib/python3.8/site-packages/flair/nn/model.py in evaluate(self, data_points, gold_label_type, out_path, embedding_storage_mode, mini_batch_size, num_workers, main_evaluation_metric, exclude_labels, gold_label_dictionary) 159 160 # predict for batch --> 161 loss_and_count = self.predict(batch, 162 embedding_storage_mode=embedding_storage_mode, 163 mini_batch_size=mini_batch_size, /local_disk0/.ephemeral_nfs/envs/pythonEnv-d7c12066-4c17-4ffe-a0d4-89c1b39a60b2/lib/python3.8/site-packages/flair/models/sequence_tagger_model.py in predict(self, sentences, mini_batch_size, all_tag_prob, verbose, label_name, return_loss, embedding_storage_mode) 368 369 if return_loss: --> 370 loss_and_count = self._calculate_loss(feature, batch) 371 overall_loss += loss_and_count[0] 372 overall_count += loss_and_count[1] /local_disk0/.ephemeral_nfs/envs/pythonEnv-d7c12066-4c17-4ffe-a0d4-89c1b39a60b2/lib/python3.8/site-packages/flair/models/sequence_tagger_model.py in _calculate_loss(self, features, sentences) 523 for s_id, sentence in enumerate(sentences): 524 # get the tags in this sentence --> 525 tag_idx: List[int] = [ 526 self.tag_dictionary.get_idx_for_item(token.get_tag(self.tag_type).value) 527 for token in sentence /local_disk0/.ephemeral_nfs/envs/pythonEnv-d7c12066-4c17-4ffe-a0d4-89c1b39a60b2/lib/python3.8/site-packages/flair/models/sequence_tagger_model.py in (.0) 524 # get the tags in this sentence 525 tag_idx: List[int] = [ --> 526 self.tag_dictionary.get_idx_for_item(token.get_tag(self.tag_type).value) 527 for token in sentence 528 ] /local_disk0/.ephemeral_nfs/envs/pythonEnv-d7c12066-4c17-4ffe-a0d4-89c1b39a60b2/lib/python3.8/site-packages/flair/data.py in get_idx_for_item(self, item) 64 log.error(f"The string '{item}' is not in dictionary! Dictionary contains only: {self.get_items()}") 65 log.error("You can create a Dictionary that handles unknown items with an -key by setting add_unk = True in the construction.") ---> 66 raise IndexError 67 68 def get_idx_for_items(self, items: List[str]) -> List[int]: IndexError:

alanakbik commented 3 years ago

Hello @rohitjajee this is a known issue that is fixed in master branch and will be part of a bugfix release that's coming soon.

In the meantime, you can install the master branch through pip to get the fix now:

pip install --upgrade git+https://github.com/flairNLP/flair.git

rohitjajee commented 3 years ago

Hello @alanakbik,

Thanks for your quick response. The code runs without any error now, but it is not calculating the metrics correctly. Please see the output below. All the values are zero.

By class: precision recall f1-score support

  uberon     0.0000    0.0000    0.0000      6540
      pr     0.0000    0.0000    0.0000      6294
   go_bp     0.0000    0.0000    0.0000      3527
      so     0.0000    0.0000    0.0000      3342

ncbitaxon 0.0000 0.0000 0.0000 3098 chebi 0.0000 0.0000 0.0000 2189 cl 0.0000 0.0000 0.0000 1735 Disease 0.0000 0.0000 0.0000 0 go_cc 0.0000 0.0000 0.0000 1153 mop 0.0000 0.0000 0.0000 95 go_mf 0.0000 0.0000 0.0000 91

micro avg 0.0000 0.0000 0.0000 28064 macro avg 0.0000 0.0000 0.0000 28064 weighted avg 0.0000 0.0000 0.0000 28064 samples avg 0.0000 0.0000 0.0000 28064

Loss: 2.3716938495635986'

alanakbik commented 3 years ago

Yes, this dataset has different labels like ncbitaxon or go_cc, while the tagger predicts entities of label disease which are not annotated in this dataset. So the score of 0. is correct because there is a label mismatch.

@mariosaenger is it correct that CRAFT_V4 has no disease labels?

rohitjajee commented 3 years ago

@alanakbik you are right!! my bad!. Craft_v4 has no disease labels. Thank you

mariosaenger commented 3 years ago

@alanakbik CRAFT_V4 is the original version of the corpus - without any label mapping. For training HunFlair we created distinct corpora for each entity type (e.g. HUNER_GENE_CRAFT_V4 or HUNER_SPECIES_CRAFT_V4) which map the corpus-specific tags to more general ones (e.g. ncbitaxon to species). However these corpora are only focused on one distinct entity type. This is necessary due to the multi gold standard training procedure of HunFlair.

Unfortunately, we don't provide a corpus version containing all HunFlair-supported entity types. This could be easily implemented, e.g.:

from pathlib import Path

from datasets import CRAFT_V4
from datasets.biomedical import HunerDataset, InternalBioNerDataset, SPECIES_TAG, GENE_TAG, CHEMICAL_TAG, \
    filter_and_map_entities

class HUNER_CRAFT_V4(HunerDataset):
    """ HUNER version of the CRAFT corpus."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    @staticmethod
    def split_url() -> str:
        return "https://raw.githubusercontent.com/hu-ner/huner/master/ner_scripts/splits/craft_v4"

    def to_internal(self, data_dir: Path) -> InternalBioNerDataset:
        corpus_dir = CRAFT_V4.download_corpus(data_dir)
        corpus = CRAFT_V4.parse_corpus(corpus_dir)

        entity_type_mapping = {
            "ncbitaxon": SPECIES_TAG, # Map corpus-specific tags to general ones
            "pr": GENE_TAG,
            "chebi": CHEMICAL_TAG
        }

        return filter_and_map_entities(corpus, entity_type_mapping)

Please note that the CRAFT corpus contains further entity annotations we don't support at all (e.g. cells or anatomical entities).

alanakbik commented 3 years ago

@mariosaenger thanks for the info and the code example!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

flairNLP / flair

Error while calculating precision, recall and f1 score for CRAFT_V4 dataset #2433