Closed 0xdevalias closed 8 months ago
I am trying to implement SoM, since it seems to have the best accuracy.
@Daisuke134 interested to see what you find. I'm going to go learn more about SoM
@0xdevalias read up more on SoM. It looks like a very promising approach, thank you for opening this issue!
read up more on SoM. It looks like a very promising approach, thank you for opening this issue!
@joshbickett No worries :)
I have been testing out SoM and seems pretty good. Here is the screenshot.. I will try adding this today, test it, and make PR.
I am implementing SoM now, and seems like the best way is to make another mode like som-mode and make a new prompt for the mode.
@Daisuke134 @0xdevalias Set-of-Mark prompting is now available. Swap in your best.pt
from a YOLOv8 model and see how it performs!
I noticed that you currently seem to apply a grid to the images to assist the vision model:
And mention this in the README:
I was wondering, have you looked at using Set-of-Mark Prompting Visual Prompting for GPT-4V / similar techniques?
See Also
A bit of a link dump from one of my references: