Closed erasmus74 closed 9 months ago
To avoid duplicate conversations, I will close this one and refer to the original thread: https://github.com/explosion/spaCy/discussions/12229
This is not a duplicate, this ticket is specific to enablement of ROCm 5.7/HIP and Spacy. In the discussion, the user is talking about installing Cupy w/ rocm 5.0 using a method far outdated for Cupy and using a far outdated version of ROCm. ROCm 6 is in fact in public release by now.
The referenced thread, the user is trying to compile ROCm 5.0 and failing regardless of Spacy. I have installed ROCm 5.6,5.7 and 6.0 in an effort to get spacy to actually use it.
@erasmus74 Did you have any progress / findings on this?
Yes actually. I'll edit this message tomorrow with some findings. I haven't had success yet, but a little progress since then.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hello, I have a vested interest in taking this over and hopefully finish the process.
Currently we're at version 6.0 of ROCm and Cupy currently has experimental support for 5.7. So we will pin to 5.7 for ROCm in tests for now until Cupy supports 6.0
So here's my testing environment;
here's my Python, ROCm and pip packages (in a virtualenv)
Finally here's my Env Variables that are relevant to the deployment of spacy
When I run the available GPU test on my AMD system;
installed cupy using steps detailed at: https://docs.cupy.dev/en/latest/install.html#using-cupy-on-amd-gpu-experimental
Testing script for cuda/rocm
Installing spacy
testing with basics spacy train
testing without the HSA variable to prove the GPU is being targetted;
So it would seem we have Rocm5.7, Cupy w/Rocm support, and the issue collects as
cupy_backends.cuda.libs.curand.CURANDError: CURAND_STATUS_ALLOCATION_FAILED
However looking at rocm6 hipRand: https://rocm.docs.amd.com/en/latest/about/release-notes.html#hiprand
I think its possible that everything is correct, but we simply can't actually use an equivalent ROCm function. But I have reached the limit of my understanding of this so far.
Another alternative issue is maybe Thinc is not supporting ROCm and so it can't load the GPU into buffer. I did test with different sets of packages, such as removing HIP libraries and alternate ROCm libs, results in combinations of GPU not found and "HIP" GPU not found. This is the farthest I've gotten, where its actually trying to init the GPU for processing.
Happy to test other scripts or debug as needed. I am dedicated to this issue as its the only thing holding me back from using GPU for much of my spacy needs. Thank you
Originally posted by @erasmus74 in https://github.com/explosion/spaCy/discussions/12229#discussioncomment-8262332