flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.81k stars 2.09k forks source link

[Question]: CSVClassificationCorpus and tagger #3392

Open ch-sander opened 8 months ago

ch-sander commented 8 months ago

Question

I trained a custom model with

tag_type='label'
column_name_map = {0: 'text', 1: tag_type}
corpus = CSVClassificationCorpus("input/test",train_file='text.txt',column_name_map=column_name_map, skip_header=True,delimiter=',',label_type=tag_type)
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type, add_unk = True)
        print(tag_dictionary)

char_embeddings = CharacterEmbeddings()
embeddings = StackedEmbeddings([char_embeddings])
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type,
                        use_crf=True)

trainer = ModelTrainer(tagger, corpus)
trainer.train('resources/taggers/' + model_name,
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=num_epochs)
model_path = 'models/flair/' + model_name
tagger.save(model_path)

When I try to tag some sentence, I get []

def tag_text_with_ner(model_path, text):

    tagger = SequenceTagger.load(model_path)

    sentence = Sentence(text)    
    tagger.predict(sentence)

    tagged_entities = []
    for entity in sentence.get_spans('ner'):
        tagged_entities.append((entity.text, entity.tag, entity.score))

    return tagged_entities 
tagged_entities = tag_text_with_ner(model_path, text)
print(tagged_entities)

The prompt is: 2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ... []

I have changed 'ner' to 'label' -- no difference. It worked fine with ColumnCorpus in the training but I need a CSV for training, not BIO.

ch-sander commented 8 months ago

I guess the issue is sequence labeling vs. text classification...yet, I was wondering if NER training can be done via a CSV file as well instead of BIO.