LeapBeyond / scrubadub_spacy

Clean personally identifiable information from dirty dirty text using spaCy.
http://scrubadub.readthedocs.io/
Apache License 2.0
40 stars 3 forks source link

Allow for using self-trained spacy ner model #2

Open mingkang111 opened 1 year ago

mingkang111 commented 1 year ago

Hi team,

Recently we have been trying to use a self-trained spacy ner model in scrubadub. However, the condition checks in the line https://github.com/LeapBeyond/scrubadub_spacy/blob/main/scrubadub_spacy/detectors/spacy.py#L111 will raise an error.

raise ValueError("Unable to find spacy model '{}'. Is your language supported? "
                            "Check the list of models available here: "
                           "https://github.com/explosion/spacy-models ".format(self.model))

We'd like to ask for the feature that allows using a self-trained spacy ner model. For example, removing this check or relaxing it? Thanks!

        if not self.check_spacy_model(self.model):
            raise ValueError("Unable to find spacy model '{}'. Is your language supported? "
                             "Check the list of models available here: "
                             "https://github.com/explosion/spacy-models ".format(self.model))
1951FDG commented 1 year ago

Hi team,

Any update?