Currently we use spacy for convert token classification datasets (more precisely NER datasets) to convert a sequence of BIO tags into spans in order to prompt the LLM in natural language.
Goal: Write own conversion function for this from BIO tags -> spans and spans -> BIO tags by search substrings in the text. It is important to keep the tokenization of the original dataset which is currently an issue. Additionally, we remove the entire spacy dependency.
Currently we use spacy for convert token classification datasets (more precisely NER datasets) to convert a sequence of BIO tags into spans in order to prompt the LLM in natural language.
Goal: Write own conversion function for this from BIO tags -> spans and spans -> BIO tags by search substrings in the text. It is important to keep the tokenization of the original dataset which is currently an issue. Additionally, we remove the entire spacy dependency.