explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.69k stars 4.36k forks source link

Spacy uses GPU by default #12474

Closed amanpreet692 closed 1 year ago

amanpreet692 commented 1 year ago

Hello, I was trying to use GPU for the pipeline and installed spacy-transformers and spacy[cuda115] (which installed cupy as well) for the same. But now when I try to run the code on any environment that has a GPU, spacy prefers it over the CPU even when the spacy.require_cpu() flag is set as shown below. I had to switch to a non GPU environment for the cpu processing to work again.

How to reproduce the behaviour

spacy.require_cpu()
nce = NounChunkExtractor(acronyms_kb=acronyms_kb, irregular_plurals=irregular_plurals)

class NounChunkExtractor:
    _SPACY_DISABLE = ["ner", "textcat"]

    def __init__(
            self,
            spacy_model_name: str = "en_core_web_sm",
            spacy_disable: Optional[Sequence[str]] = None
    ):
        with necessary(
                modules=spacy_model_name,
                message=f"Run `python -m spacy download {spacy_model_name}`",
        ):
            self.nlp = spacy.load(
                spacy_model_name,
                disable=(spacy_disable or self._SPACY_DISABLE),
            )

Your Environment

Info about spaCy

adrianeboyd commented 1 year ago

It shouldn't be using the GPU for processing with this example. I suspect what's you're seeing is that if cupy is installed, it's automatically imported by spacy and just importing it and setting it up for potential use uses some GPU RAM.

Unfortunately I think currently the only workaround to the GPU RAM usage is to uninstall cupy in this environment.

adrianeboyd commented 1 year ago

I tested this a bit more locally and the amount of RAM this currently uses is a lot higher than I remembered, and we'll take a closer look to see if we can improve how this is loaded in the background.

shadeMe commented 1 year ago

Upon closer investigation, we found that the custom cupy kernels that we ship with Thinc were getting compiled during module initialization, thereby causing cupy to allocate GPU memory.

We have a PR in the works that will fix this by deferring the compilation until the first invocation of the kernel.

svlandeg commented 1 year ago

Closing this as https://github.com/explosion/thinc/pull/870 has been merged. Thanks again for the report!

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.