ExplainableML / Vision_by_Language

[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
MIT License
37 stars 2 forks source link

access to openai #2

Closed Pefect96 closed 6 months ago

Pefect96 commented 6 months ago

I still have some questions.  (a) My Linux Server may not be able to access openai, what can I do?  (b) In the Line 181 of the utils.py, i.e., https://github.com/ExplainableML/Vision_by_Language/blob/525f90411336e9d93c46f04609d62fa431bfe16e/src/utils.py#L181, this function is incomplete and has no concrete implementation, please ask where to get it?

Confusezius commented 6 months ago

Hi there! To answer your questions:

(a) without access to OpenAI and the respective services, you may need to resort to the use of publically available LLMs such as LLama(2). Unfortunately, any other workaround depends on your particular linux server. (b) this line performs the caption modification using llama - it's not part of this current codebase version, but can be easily replicated by simply querying llama from huggingface using the listed hyperparameters in the commented line. The input is the same as the one used for ChatGPT.

Pefect96 commented 6 months ago

Thanks for your reply! I have achieved the pretrained Llama-2-7b-chat-hf model. However, I observe that the precessing of Modifying captions with LLM... is very slow. In your reasoning process, how much time does it take?

Confusezius commented 6 months ago

Great that you got it to work already! Unfortunately, having to utilize llama locally can often take longer simply because local compute setups might not be optimal. Similarly, the time it needs to run llama also heavily depends on your available hardware, so it is hard to tell. But definitely longer than just querying chatgpt!