NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
https://shikun.io/projects/prismer
Other
1.3k stars 75 forks source link

Add Gradio Demo #8

Closed bjoernpl closed 1 year ago

bjoernpl commented 1 year ago

Adds a versatile gradio demo in app.py that currently supports vqa and caption tasks as well as running with or without the use of the expert models on Prismer or PrismerZ models respectively. Saves the temporary image in the helpers/images folder so that the expert label generation scripts work. Run with:

python app.py --task="vqa" --model_name="prismer_base"

Sorry for the messy commit history on this, hope that's no issue. Open for any feedback.

lorenmt commented 1 year ago

Hi @bjoernpl, really thanks for your contribution.

Here are some other suggestions:

  1. I can run caption models successfully, but for VQA, it returns an error in img_path.
  2. I believe it would be cleaner if have two pannels, one for VQA and one for Caption? And in each panel, we can visualise the expert labels as well. I can post-process the labels to make it pretty, but we can first just use the generated original labels for testing.
  3. In the caption models, it would be cleaner to not show the prefix "A picture of". And rename the column "question" into "caption".
  4. I found the "experts" panel does not show anything at all. So I assume it's not working properly?

Again really appreciate the help.

Update: I have now provided an expert label visualisation script. Hope that would be helpful! thanks!

bjoernpl commented 1 year ago

Thanks for the feedback! I had the experts and the visualisation of their labels working on my machine, might be an error with some path. I'll look into fixing it next week.