flairNLP / fabricator

[EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.
Apache License 2.0
98 stars 12 forks source link

Remove spacy dependency #55

Closed whoisjones closed 8 months ago

whoisjones commented 11 months ago

Currently we use spacy for convert token classification datasets (more precisely NER datasets) to convert a sequence of BIO tags into spans in order to prompt the LLM in natural language.

Goal: Write own conversion function for this from BIO tags -> spans and spans -> BIO tags by search substrings in the text. It is important to keep the tokenization of the original dataset which is currently an issue. Additionally, we remove the entire spacy dependency.