DavidMChan / caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.
Other
40 stars 4 forks source link

Can we perform captioning tasks on pedestrian images in reality to obtain finer grained text descriptions. #12

Closed shams2023 closed 6 months ago

shams2023 commented 6 months ago

Can we perform captioning tasks on pedestrian images in reality to obtain finer grained text descriptions. I have some real-life pedestrian images and I would like to generate corresponding text descriptions for them. I am not sure if your model can be implemented, and I look forward to your reply! Thank you!

DavidMChan commented 6 months ago

It is likely that this will work for your problem, however it is hard to say exactly without additional detail (as our method can be applied to any standard image captioning approach). I encourage reading through our paper, and determining if it is applicable to your particular situation.