BCHSI / social-determinants-of-health-clbp

code for the manuscript https://www.medrxiv.org/content/10.1101/2022.03.04.22271541v1
6 stars 2 forks source link

Spacy won't load both the CNN and the BoW models in the same process #3

Open wes-brooks opened 4 months ago

wes-brooks commented 4 months ago

I thought I'd run these models side-by-side to see how they compare. It turns out, though, that spacy won't let you load two pipelines that each define a factory with the same name (seems like an odd choice to me, but who am I?)

You get errors like this :

import spacy
bow = spacy.load("en_sdoh_bow_cui")
cnn = spacy.load("en_sdoh_cnn_ner_cui")

ValueError: [E004] Can't set up pipeline component: a factory for 'sdoh_cui' already exists. Existing factory: <class 'en_sdoh_bow_cui.postprocess.SDOH'>. New factory: <class 'en_sdoh_cnn_ner_cui.postprocess.SDOH'>

So, to fix this in my install, I went into the saved spacy models (at $PYTHON_PACKAGE_PATH/en_sdoh_bow_cui/ and $PYTHON_PACKAGE_PATH/en_sdoh_cnn_ner_cui/) and changed all references to the sdoh_cui factory to be either sdoh_cui_bow or sdoh_cui_cnn, respectively. Might be worth making that change on the repository so that it works for other users.

Also see this discussion on the spacy repo

DSLituiev commented 4 months ago

Thank you for catching this. Pull requests are welcome