About the GPU - Githubissues

Pefect96 commented 6 months ago

Good job! How many GPUs are used in the inference process? Does it have to be A100 or V100?

sgk98 commented 6 months ago

Hi! You should need only 1 GPU for the inference process, we have only tried with V100s and A100s, but if you use a smaller model for retrieval (e.g. ViT-B/32), then as long as the GPU you are using can support captioning with a BLIP-2 model (you can check their requirements here https://github.com/salesforce/LAVIS), everything should work smoothly. However, if the issue is loading the BLIP-2 captioning model (with a Flan-T5-XXL language model), you could either use a smaller language model, or another captioning model like CoCa or BLIP which should run on pretty much any GPU. Let me know in case you run into any issues with this!

Pefect96 commented 6 months ago

Thank you for your reply! However, when I run the code, I encounter the following problem: WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8874060220>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v1/chat/completions

sgk98 commented 6 months ago

This issue is not because of the GPU, but because of the LLM being used to reformulate the (caption,modifier) into the target caption. I believe that the issue could either be due to the machine you're using has some network issues, or the OpenAI API keys haven't been setup. If you would like to use the gpt-3.5-turbo model for this, you should enter your OpenAI API key as a command line argument --openai_key <enter_your_key>. If this does not work, another fix would be to hardcode your API key here https://github.com/ExplainableML/Vision_by_Language/blob/master/src/openai_api.py#L2. If you don't want to use an OpenAI LLM, you could alternatively use any other LLM (which either runs locally or from an API service) by just replacing the call here https://github.com/ExplainableML/Vision_by_Language/blob/master/src/utils.py#L180. We found that a Vicuna13B model did reasonably well, I would imagine the recent Mistral models to also perform quite well for the task. For this specific issue, I would first check if you're able to make a request to the OpenAI API and get a response from the LLM to ensure that this works smoothly. Apologies for not clarifying this in the README, we will update it to be more informative about this!

Pefect96 commented 6 months ago

Thank you very much! I will try the method you provided, maybe the network problem.

ExplainableML / Vision_by_Language

About the GPU #1