cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Other
1.63k stars 117 forks source link

CUDA out of memory (XVLM Model) #33

Closed SergioMOrozco closed 6 months ago

SergioMOrozco commented 11 months ago

I am trying to run the "main_simple.ipynb" example script that was given, but I get the following error message when running from main_simple_lib import *:

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.77 GiB total capacity; 10.38 GiB already allocated; 12.31 MiB free; 10.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I reduced the BLIP2 model size from XXL to XL as seen here: image

Additionally, I realized that the models are loaded from a list_models variable, so I printed them out as they were loaded consecutively, as seen here:

    counter_ = 0
    for model_class_ in list_models:
        print("MODEL: " + str(model_class_))
        for process_name_ in model_class_.list_processes():
            if process_name_ in config.load_models and config.load_models[process_name_]:
                consumers[process_name_] = make_fn(model_class_, process_name_, counter_)
                counter_ += 1

I see that the OOM traceback doesn't occur until I load the XVLM model, as seen here:

image

Is there any way I can fix this issue? I'm currently running a RTX 3080TI with 12GB of memory. I have tried not loading the XVLM model by using the following configuration:

image

which resolves my OOM exception, but I see new errors when I run execute_code(code, im, show_intermediate_steps=True) as seen here:

image

I'm not sure if the TypeError: object of type 'NoneType' has no len() is due to the XVLM model not being loaded, however. Any help would be greatly appreciated!

surisdi commented 11 months ago

Hi, the error you're getting (CUBLAS_STATUS_NOT_INITIALIZED) is very probably a GPU out of memory error. Can you monitor the memory of your GPU during the execution? Probably the loaded models (GLIP and BLIP2) take most of the space, and then when you run inference on any of them, you fill everything that's left. You can try reducing even further the size of BLIP2 (we haven't tried smaller models, but they may work decently).

SergioMOrozco commented 11 months ago

Thanks for the quick response @surisdi ! Are you aware of any smaller models of BLIP2? I only see the following models on hugging face:

image

I'm not sure if any of them are smaller than the xl model I used.

Nineves commented 11 months ago

same issue here, i'm using Google Colab's computing resources. The GPU memory is quite limited and cannot support to run the model.

Nineves commented 11 months ago

same issue here, i'm using Google Colab's computing resources. The GPU memory is quite limited and cannot support to run the model.

Update: I changed it to A100 with 16G RAM, it is able to support the model to run