Open stefan-it opened 11 months ago
Hi,
there's a new EMNLP 2023 paper that introduces version 2 of MultiCoNER dataset.
MultiCoNER v2 should also be supported in Flair :hugs:
The dataset is hosted on the Hugging Face Model Hub:
https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main
Train, Development and Testfiles can also be accessed there, e.g. see files for German:
https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main/DE-German
It should be discussed, if we can extend the existing NER_MULTI_CONER implementation, and add a version tag to it:
NER_MULTI_CONER
version
https://github.com/flairNLP/flair/blob/ed53c42ec2e8d8abbd07acd7f6b531945ac45606/flair/datasets/sequence_labeling.py#L3048C7-L3055
class NER_MULTI_CONER(MultiFileColumnCorpus): def __init__( self, task: str = "multi", version: str = "v1", base_path: Optional[Union[str, Path]] = None, in_memory: bool = True, **corpusargs, ) -> None:
The version parameter is then set to v1 to ensure backward-compatibility :thinking:
v1
I agree @stefan-it - that would be great to add!
Problem statement
Hi,
there's a new EMNLP 2023 paper that introduces version 2 of MultiCoNER dataset.
MultiCoNER v2 should also be supported in Flair :hugs:
Solution
The dataset is hosted on the Hugging Face Model Hub:
https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main
Train, Development and Testfiles can also be accessed there, e.g. see files for German:
https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main/DE-German
Additional Context
It should be discussed, if we can extend the existing
NER_MULTI_CONER
implementation, and add aversion
tag to it:https://github.com/flairNLP/flair/blob/ed53c42ec2e8d8abbd07acd7f6b531945ac45606/flair/datasets/sequence_labeling.py#L3048C7-L3055
The
version
parameter is then set tov1
to ensure backward-compatibility :thinking: