Open ch-sander opened 10 months ago
I trained a custom model with
tag_type='label' column_name_map = {0: 'text', 1: tag_type} corpus = CSVClassificationCorpus("input/test",train_file='text.txt',column_name_map=column_name_map, skip_header=True,delimiter=',',label_type=tag_type) tag_dictionary = corpus.make_label_dictionary(label_type=tag_type, add_unk = True) print(tag_dictionary) char_embeddings = CharacterEmbeddings() embeddings = StackedEmbeddings([char_embeddings]) tagger = SequenceTagger(hidden_size=256, embeddings=embeddings, tag_dictionary=tag_dictionary, tag_type=tag_type, use_crf=True) trainer = ModelTrainer(tagger, corpus) trainer.train('resources/taggers/' + model_name, learning_rate=0.1, mini_batch_size=32, max_epochs=num_epochs) model_path = 'models/flair/' + model_name tagger.save(model_path)
When I try to tag some sentence, I get []
[]
def tag_text_with_ner(model_path, text): tagger = SequenceTagger.load(model_path) sentence = Sentence(text) tagger.predict(sentence) tagged_entities = [] for entity in sentence.get_spans('ner'): tagged_entities.append((entity.text, entity.tag, entity.score)) return tagged_entities tagged_entities = tag_text_with_ner(model_path, text) print(tagged_entities)
The prompt is: 2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ... []
2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ...
I have changed 'ner' to 'label' -- no difference. It worked fine with ColumnCorpus in the training but I need a CSV for training, not BIO.
'ner'
'label'
ColumnCorpus
I guess the issue is sequence labeling vs. text classification...yet, I was wondering if NER training can be done via a CSV file as well instead of BIO.
Question
I trained a custom model with
When I try to tag some sentence, I get
[]
The prompt is:
2024-01-11 14:34:55,577 SequenceTagger predicts: Dictionary with 15 tags: <unk>, ...
[]
I have changed
'ner'
to'label'
-- no difference. It worked fine withColumnCorpus
in the training but I need a CSV for training, not BIO.