Question : Is there a part in the SPHINX repository that switches images to Subimage?

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

https://llama2-accessory.readthedocs.io/

Other

2.61k stars 167 forks source link

Question : Is there a part in the SPHINX repository that switches images to Subimage? #169

Open SuwonPabby opened 4 months ago

SuwonPabby commented 4 months ago

Hello, I am currently working with the SPHINX library for various tasks, and I cannot seem to find the functionality within the repository to convert images to sub-images.

I am wondering if this functionality is absent from the repository, requiring an external solution, or if I am simply overlooking it.

Thank you for your assistance.

ChrisLiu6 commented 4 months ago

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/2fe5e0b4f8bdc5df4fc0def5939d8838618ffe1e/accessory/model/LLM/llama_ens5.py#L383-L385

the ens5 and ens10 in model naming mean totally 5 or 10 views, including the global view

SuwonPabby commented 4 months ago

Thanks for your help!!!

SuwonPabby commented 4 months ago

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/40cf02b7a5857e4b635344530450a460176df3aa/accessory/model/LLM/llama_ens10.py#L377-L390

But I still wonder about:

the ens10 means 10 views of images including the global view

But, the code above is the code for ens10, and it still looks like it is putting 5 views.

ChrisLiu6 commented 4 months ago

The following attribute: https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/40cf02b7a5857e4b635344530450a460176df3aa/accessory/model/LLM/llama_ens10.py#L336 defines the size of the image. In ens10 it is 672 so there would be 9 local views. Sorry for the wrong comments.

SuwonPabby commented 4 months ago

Thank you for your kind information! It really helped me a lot!