Closed hh1985 closed 1 year ago
Thanks for reaching out to us.
As long as the training datasets for new types exist, the supported type of BERN2 can be expanded. We will consider providing how to train our NER model on new datasets.
I am also interested in identifying new entity types. I would very much appreciate a tutorial on how to do this.
We uploaded a tutorial on how to train our NER model for the supporting entity types. https://github.com/dmis-lab/BERN2/tree/main/multi_ner/training
By preprocessing the dataset of the new entity type and adding it as a training set, you will be able to get a NER model for the new entity type. If you have any follow-up questions, please re-open this issue.
Hi! If I want to train a new type, should I modify the modeling.py file to set up a separate classifier for training, and then add the modified classifier layer to the original modeling.py file?
Hi @liwenqingi
If you have data of a new type, then you could modify and train through the modeling.py file to set up a separate classifier.
@minstar I see that you are training several types together. I want to integrate new types of entities into bern2_ner. Should I add a classifier to the modeling.py file and conduct joint training with the original entities, or just train new entities separately and add them to in bern2_ner?Because I found the f1-score for training entities separately(like "disease") are low(50 epochs around 0.6). Thanks for your reply!
I prefer to choose the latter case, this is because you have to find optimal training settings which could be time-consuming and labor-intensive things. May I ask the reason that training entities separately could cause the low f1 score?
I prefer to choose the latter case, this is because you have to find optimal training settings which could be time-consuming and labor-intensive things. May I ask the reason that training entities separately could cause the low f1 score?
I modified the structure of modeling.py just to test the feasibility of training the classifier separately, using NERdata data (like gene,disease,..) for separate training, but the effect is not very good
Then, how about adding your new entity classifying system through socket communication as we did?
In bern2.py, line 361-363, we separately get the results of tmvar, gnormplus, and our multi-ner classifier.
for ner_type in ['tmvar', 'gnormplus', 'mtner']:
arguments_for_coroutines.append([ner_type, pubtator_file, output_mtner, base_name, loop])
async_result = loop.run_until_complete(self.async_ner(arguments_for_coroutines))
You could train your entities solely which could be better than just modifying the classifier separately.
Then, how about adding your new entity classifying system through socket communication as we did?
In bern2.py, line 361-363, we separately get the results of tmvar, gnormplus, and our multi-ner classifier.
for ner_type in ['tmvar', 'gnormplus', 'mtner']: arguments_for_coroutines.append([ner_type, pubtator_file, output_mtner, base_name, loop]) async_result = loop.run_until_complete(self.async_ner(arguments_for_coroutines))
You could train your entities solely which could be better than just modifying the classifier separately.
Thanks for your reply! But I want to use bern2 locally and may not need socket interaction because a large amount of data needs to be processed. At the same time, I just found that the effect of training "species" entities is very good, which may be related to the annotation quality of the data.
Any idea of extending it to support new classes, such as microbiota? Thanks.