flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.94k stars 2.1k forks source link

How to update the labels of a dataset loaded from Flair? #3471

Open DonaldFeuz opened 5 months ago

DonaldFeuz commented 5 months ago

Question

Hi, I am loading a JNLPBA dataset from the Flair library, and I would like to keep only the protein mentions, renaming them as "Gene." Additionally, I want to remove all other labels different from 'protein' in the dataset for training my gene REN model. However, when I go through the Flair documentation, I can't find a way to achieve my goal as all my attempts fail. Here is an example of the code I wrote.

`from flair.data import Sentence

def rename_and_remove_labels(sentence: Sentence):

new_labels = []

for label in sentence.get_labels():
    if label.value == 'protein':
        # Ajouter un nouveau label 'Gene' pour chaque label 'protein'
        new_labels.append((label.data_point.start_position, label.data_point.end_position, 'Gene'))

sentence.remove_labels([label.value for label in sentence.get_labels()])

for start_pos, end_pos, new_label in new_labels:
    span = sentence[start_pos:end_pos]
    span.add_label(new_label)

return sentence

sentence = Sentence("IL-2 gene expression and NF-kappa B activation through CD28 requires reactive oxygen production by 5-lipoxygenase.") sentence[0:2].add_label('ner', 'DNA') sentence[4:6].add_label('ner', 'protein') sentence[8:9].add_label('ner', 'protein') sentence[14:15].add_label('ner', 'protein')

print("Avant :") for label in sentence.get_labels(): print(label) print(sentence)

sentence = rename_and_remove_labels(sentence)

print("\nAprès :") for label in sentence.get_labels(): print(label) `

alanakbik commented 4 months ago

Hello @DonaldFeuz what Flair version are you on? How are you loading the JNLPBA dataset?