aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
https://aphp.github.io/edsnlp/
BSD 3-Clause "New" or "Revised" License
111 stars 29 forks source link

edsnlp.data.read_standoff() the tokenizer argument fails with EDSTokenizer #263

Closed Aremaki closed 6 months ago

Aremaki commented 6 months ago

edsnlp.data.read_standoff() the tokenizer argument fails with EDSTokenizer

Description

Pydantic expects Tokenizer and not EDSTokenizer. The issue is that EDSTokenizer inherits from spacy.DummyTokenizer class instead of spacy.Tokenizer.

How to reproduce the bug

import edsnlp

nlp = edsnlp.blank("eds")
path = "path to BRAT file"
docs = list(edsnlp.data.read_standoff(
            path,
            tokenizer=nlp.tokenizer,
        ))

Your Environment

percevalw commented 6 months ago

Thank you for the issue ! This was just fixed in #260, hopefully this works for you 🤞