Closed Abdulhanan535 closed 1 month ago
It looks like CUDA isn't installed? Can you run nvcc --version
in the notebook? I'll try it out myself too later.
oki
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0
Oof, you're using kaggle? Brave man. ;-)
Not like I know much about this, but something is calling out to me from the logs:
Did you try passing CUDA_INCLUDE_DIRS (or a prefix for all of CUDA, it probably lets you do that I guess) to cmake?
eh whats that?
yea, im using kaggle, cuz don't have much vram on my gpu and kaggle has total of 30Gb vram for free lol.
anyone??
I couldn't get my kaggle account verified to test this. Can you run Docker in kaggle? If so, try this:
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 2242:2242 \
alpindale/aphrodite-openai --model <your model> [other args]
If not, try running this inside the aphrodite directory:
./runtime.sh aphrodite run <your model> [--other-args]
okay.
it's installing smth like 6 gbs of data..
there is a long log, here is the short version which i think causing error
` [2/31] Building CUDA object CMakeFiles/_C.dir/kernels/cache_kernels.cu.o In file included from /kaggle/aphrodite-engine/kernels/cache_kernels.cu:6: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
In file included from /kaggle/aphrodite-engine/kernels/cache_kernels.cu:6: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
[3/31] Building CUDA object CMakeFiles/_moe_C.dir/kernels/moe/softmax.cu.o [4/31] Building CUDA object CMakeFiles/_C.dir/kernels/pos_encoding_kernels.cu.o In file included from /kaggle/aphrodite-engine/kernels/pos_encoding_kernels.cu:6: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
In file included from /kaggle/aphrodite-engine/kernels/pos_encoding_kernels.cu:6: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
[5/31] Building CUDA object CMakeFiles/_C.dir/kernels/activation_kernels.cu.o In file included from /kaggle/aphrodite-engine/kernels/activation_kernels.cu:8: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
In file included from /kaggle/aphrodite-engine/kernels/activation_kernels.cu:8: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
[6/31] Building CUDA object CMakeFiles/_C.dir/kernels/layernorm_kernels.cu.o In file included from /kaggle/aphrodite-engine/kernels/layernorm_kernels.cu:5: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
In file included from /kaggle/aphrodite-engine/kernels/layernorm_kernels.cu:5: /kaggle/aphrodite-engine/kernels/dispatch_utils.h:36:60: warning: backslash-newline at end of file 36 | #define APHRODITE_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...) \ |
---|
/kaggle/aphrodite-engine/kernels/layernorm_kernels.cu(207): warning #1444-D: variable "std::is_pod_v [with _Tp=aphrodite::_f16Vec<c10::Half, 8>]" (declared at line 3154 of /kaggle/aphrodite-engine/conda/envs/aphrodite-runtime/x86_64-conda-linux-gnu/include/c++/11.3.0/type_traits) was declared deprecated ("use is_standard_layout_v && is_trivial_v instead")
static_assert(std::is_pod_v<_f16Vec<scalar_t, width>>);
^
detected during instantiation of "std::enable_if_t<
Remark: The warnings can be suppressed with "-diag-suppress
[7/31] Building CUDA object CMakeFiles/_C.dir/kernels/quantization/squeezellm/quant_cuda_kernel.cu.o
/kaggle/aphrodite-engine/kernels/quantization/squeezellm/quant_cuda_kernel.cu: In function 'void squeezellm_gemm(at::Tensor, at::Tensor, at::Tensor, at::Tensor)':
/kaggle/aphrodite-engine/kernels/quantization/squeezellm/quant_cuda_kernel.cu:198:141: warning: 'T at::Tensor::data() const [with T = c10::Half]' is deprecated: Tensor.data
in end the storage got full and it crashed.. :\ 70gb of storage... was running a 8B model, also i was able to run previous versions with ez like 0.5.3 0.5.2 etc etc but this one is not working.
Those are benign warnings. The actual error happens in another part, but you've not included that.
Those are benign warnings. The actual error happens in another part, but you've not included that.
I had literally just posted my response right as you did, haha. I figured that somehow something was getting munged by whatever he was doing on kaggle. I had told him that the preprocessor tells you about that so that you don't spend a thousand hours banging your head against your desk. I noticed that he'd cut off the rest of it, but figured that he'd identified that as the problem, otherwise he'd have included the rest. But that doesn't make very much sense, now does it, haha.
I have a kaggle account. I'll run the notebook and see what happens--if the thing actually works. Haven't used that in a long time because I don't have that much of a penchant for masochism.
Perhaps you could use a pastebin to share the entire log with us
For what it's worth, I've fixed all compiler warnings and notices in the latest rc_054. If anything fails, you'll immediately see the error logs instead of having warnings hog the entire screen space.
nice, i'll check again today.
it worked! but took 1 hour to install everything.. Can you fix that???
and used about 40GB of storage without the model :\
it worked! but took 1 hour to install everything.. Can you fix that???
We have a lot of kernels in order to stay performant, so we can't get rid of them. However, we'll be switching to nightly wheels soon (built for every commit), so you can just use them instead of building it yourself. I will drop a notice here once that's setup.
okay, also it's stuck here
WARNING: Reducing Torch parallelism from 2 threads to 1 to avoid unnecessary
CPU contention. Set OMP_NUM_THREADS in the external environment to tune this
value as needed.
INFO: Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO: Using XFormers backend.
(AphroditeWorkerProcess pid=4561) INFO: Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
(AphroditeWorkerProcess pid=4561) INFO: Using XFormers backend.
/kaggle/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
(AphroditeWorkerProcess pid=4561) /kaggle/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
(AphroditeWorkerProcess pid=4561) @torch.library.impl_abstract("xformers_flash::flash_fwd")
/kaggle/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
(AphroditeWorkerProcess pid=4561) /kaggle/aphrodite-engine/conda/envs/aphrodite-runtime/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
(AphroditeWorkerProcess pid=4561) @torch.library.impl_abstract("xformers_flash::flash_bwd")
(AphroditeWorkerProcess pid=4561) INFO: Worker ready; awaiting tasks
INFO: generating GPU P2P access cache in
/root/.config/aphrodite/gpu_p2p_access_cache_for_0,1.json
v0.6.0 works great and easily...
Your current environment
🐛 Describe the bug