Closed SergioMOrozco closed 6 months ago
Hi, the error you're getting (CUBLAS_STATUS_NOT_INITIALIZED
) is very probably a GPU out of memory error. Can you monitor the memory of your GPU during the execution? Probably the loaded models (GLIP and BLIP2) take most of the space, and then when you run inference on any of them, you fill everything that's left. You can try reducing even further the size of BLIP2 (we haven't tried smaller models, but they may work decently).
Thanks for the quick response @surisdi ! Are you aware of any smaller models of BLIP2? I only see the following models on hugging face:
I'm not sure if any of them are smaller than the xl model I used.
same issue here, i'm using Google Colab's computing resources. The GPU memory is quite limited and cannot support to run the model.
same issue here, i'm using Google Colab's computing resources. The GPU memory is quite limited and cannot support to run the model.
Update: I changed it to A100 with 16G RAM, it is able to support the model to run
I am trying to run the "main_simple.ipynb" example script that was given, but I get the following error message when running
from main_simple_lib import *
:OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.77 GiB total capacity; 10.38 GiB already allocated; 12.31 MiB free; 10.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I reduced the BLIP2 model size from XXL to XL as seen here:![image](https://github.com/cvlab-columbia/viper/assets/39529018/d641c11f-991c-4865-bcb5-a9a9b380a2d2)
Additionally, I realized that the models are loaded from a
list_models
variable, so I printed them out as they were loaded consecutively, as seen here:I see that the OOM traceback doesn't occur until I load the XVLM model, as seen here:
Is there any way I can fix this issue? I'm currently running a RTX 3080TI with 12GB of memory. I have tried not loading the XVLM model by using the following configuration:
which resolves my OOM exception, but I see new errors when I run
execute_code(code, im, show_intermediate_steps=True)
as seen here:I'm not sure if the
TypeError: object of type 'NoneType' has no len()
is due to the XVLM model not being loaded, however. Any help would be greatly appreciated!