ebanalyse / NERDA

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks
MIT License
153 stars 35 forks source link

Redundant "out-side" label and final model training. #21

Open danbogu opened 3 years ago

danbogu commented 3 years ago

Hi, When using a labeled data where "non-entity" words are already labeled with "O", 'tag_ouside' variable set to "O" adds a label and makes the input tensor dimensions incorrect (by 1 unit).

Also, when training final model after hyperparameter tuning on full data, is there a way to disable validation set while training?

smaakage85 commented 2 years ago

Hi @danbogu

First and foremost - thanks for the feedback, it is much appreciated :)

When using a labeled data where "non-entity" words are already labeled with "O", 'tag_ouside' variable set to "O" adds a label and makes the input tensor dimensions incorrect (by 1 unit).

Can you provide a reproducible example? It would make it much easier for me to investigate the matter. Besides from that, the tags of named entities should definitely be different from the special outside tag.

Also, when training final model after hyperparameter tuning on full data, is there a way to disable validation set while training?

Unfortunately not, but I am certainly open to the idea, and if you make a pull request on this, I will be happy to inspect it and merge it in :) Are you interested in this?

danbogu commented 2 years ago

Hey @smaakage85, About the first topic, at a later stage, I realized that I just need to omit 'O' from the tag_scheme I provide to solve the dimension exception. If you still want me to provide a more detailed example I would be happy to do so.

Also, I am surely interested to try to resolve the second thing I mentioned. I have a few simple solutions implementations in my head. let's do it :)

smaakage85 commented 2 years ago

Sounds good @danbogu

Looking forward to your pull request :D