Closed coolcoder001 closed 2 years ago
Hi, Thanks a lot for the quick response. :) My extractor function using flair takes input as a string and outputs the extracted entities in a pandas dataframe.
def entity_recognition(text):
"""Given a text document, run a NER on it using flair and return a dataframe with the following columns
text: actual raw text input
entity: identified entity text
entity_start: character start position of entity in raw text
entity_end: character end position of entity in raw text
"""
import pandas as pd
from flair.data import Sentence
from flair.models import SequenceTagger
tagger_fast = SequenceTagger.load('ner-ontonotes-fast')
sentence = Sentence(text)
tagger_fast.predict(sentence, mini_batch_size=16)
entities = []
for i in tqdm(range(len(sentence.to_dict(tag_type='ner')['entities']))):
str_main=None
start_pos = -1
end_pos = -1
if str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'ORG':
str_main = str(sentence.to_dict(tag_type='ner')['entities'][i]
['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
elif str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'PERSON':
str_main = str(sentence.to_dict(tag_type=
'ner')['entities'][i]['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
elif str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'GPE':
str_main = str(sentence.to_dict(tag_type=
'ner')['entities'][i]['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
if str_main is not None and (start_pos!=-1 and end_pos!=-1):
entities.append([str_main, start_pos, end_pos])
entities = pd.DataFrame(entities, columns=['entity', 'entity_start', 'entity_end'])
entities['text'] = text
return entities
Can you please help me with the changes I need to make to this function so that it can work with bootleg?
Thanks in advance.
So I went ahead and added your function as an example in the branch here. If you use the annotator and use the extract method of custom, it should trigger your extractor. I haven't tested it but it should get you started.
Hi @lorr1 , thanks a lot for your help. You are so nice and awesome :)
I am able to run this code using the Flair NER engine.
However, if I have to do some more changes, can I directly push them to the branch you created? or do I need to raise PR ?
How about you raise PRs? I'll pretty much approve everything, but I'd like to keep track of what you're finding difficult/useful to implement.
Thanks!
Hi , Thanks a lot for the project .It is indeed wonderful.
However , I would like to replace NER engine . I want to use Flair , instead of Spacy.
Can I do that ?