Open SuwonPabby opened 4 months ago
the ens5 and ens10 in model naming mean totally 5 or 10 views, including the global view
Thanks for your help!!!
But I still wonder about:
the ens10 means 10 views of images including the global view
But, the code above is the code for ens10, and it still looks like it is putting 5 views.
The following attribute: https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/40cf02b7a5857e4b635344530450a460176df3aa/accessory/model/LLM/llama_ens10.py#L336 defines the size of the image. In ens10 it is 672 so there would be 9 local views. Sorry for the wrong comments.
Thank you for your kind information! It really helped me a lot!
Hello, I am currently working with the SPHINX library for various tasks, and I cannot seem to find the functionality within the repository to convert images to sub-images.
I am wondering if this functionality is absent from the repository, requiring an external solution, or if I am simply overlooking it.
Thank you for your assistance.