Closed xinli2008 closed 6 months ago
And I tried another methods, given a original image and mask image (mode is L), i combine them to alpha image. the combined image and changed rewrited forward are as follows:
And my prompt is : Describe this image and its style in a very detailed manner. The generated text is : The image features a large black and white dog with a smile on its face, likely a German Shepherd. The dog is the main focus of the image, taking up a significant portion of the frame. The dog appears to be enjoying a moment of happiness, possibly posing for a picture. I don't think the results are too good, because he still seems to be paying attention to the black areas of the image. Can you give me some useful advice?
Your mask transform should involve center_crop, same as llava preprocess does.
thank you for your kindly reply~! I try that method, unfortunately, it does not work without transform.centercrop, the visualized image and mask are as follows: The mask area is correctly mapped to the original image, but the generated text is consistent with the result without Alpha-clip
Is this problem solved? Maybe LLaVA need to be fine tuned to match alpha clip's output?
No! we suspect that alpha-clip is a mechanism of attention. As in the image above, this is more difficult when the main body of the image is a banana and we want to use alpha-clip to texture the background area; Good luck
I did encounter this problem when using my own model(LLMs are Vicuna and GLM): "But the generated text doesn't seems to be a specific area of concern and i am wondering why. can you give me some useful advice?" Since the paper mentioned they finetuned LLaVA 1.5 with alpha clip, so i doubt the zero shot stitching ability of alpha clip
Depending on the image, it may be when the subject of the image is more obvious, and we try to ignore the subject completely, which may be difficult to do with the alpha-clip with LLaVA.
hi, I believe your case work in some degree when using Alpha-CLIP The official demo is now available now! you can checkout on your own.
hi, I believe your case work in some degree when using Alpha-CLIP The official demo is now available now! you can checkout on your own.
Thanks for sharing the demo. May i ask if it's necessary to fine tune Alpha clip when stitching to a new MLLM? As for me, the zero shot stitching works not well.
Our demo doesn't involve finetuning of LLaVA. It only involve replacing original CLIP with Alpha-CLIP. the code is also available in demo/with_llm
. I believe a bit finetuning can help you get better result.
Thanks for replying :)
Sorry to bother you in your busy time and i am hurry to cary out alpha-clip with LLaVA-7b-clip. I followed the instructions in here and changed something. The input images and masks are as follows: and the rewrited forward are as follows: But the generated text doesn't seems to be a specific area of concern and i am wondering why. can you give me some useful advice? Thank you