bigscience-workshop / biomedical

Tools for curating biomedical training data for large-scale language modeling
461 stars 115 forks source link

CAS: NED missing #692

Open sg-wbi opened 2 years ago

sg-wbi commented 2 years ago

From the paper

Concept Unique Identifiers (CUI) corre-
sponding to French terms from the UMLS
(Lindberg et al., 1993) for single or multi-
word terms. For multi-word terms, the an-
notations exploits the IOB (Inside-Outside-
Begin) format. For instance, the two-word
term vitamine B12 is encoded as follows:
- ... O
- vitamine B-C0042845
- B12 I-C0042845
- ... O

but _SUPPORTED_TASKS has only TEXT_CLASSIFICATION

sg-wbi commented 2 years ago

https://github.com/bigscience-workshop/biomedical/tree/master/bigbio/biodatasets/cas