facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.48k stars 2.1k forks source link

Script to render Images and captions #2066

Open vedantpuri opened 5 years ago

vedantpuri commented 5 years ago

Is your feature request related to a problem? Please describe. Related to a sub-task of #2021. It would be useful to have a script to generate images and their captions. Something of this sort has already been done for chit-chat in #2035 (Updates in #2059) and can be used for reference.

Describe the solution you'd like The high level idea is the same as for conversation rendering:

Additional context @klshuster Could you provide a fixed format to process the data like there was in convo_render ? Also a rough vision of how you would want it to look like would be great. An idea would be: A polaroid like box in the center of the screen (grey/white background) with the caption in messenger blue below the image.

stephenroller commented 5 years ago

Instead of a new script, I suggest you simply add it to the rendering you already wrote, and ingest the image part of the Message if it's present. You can use <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="/> to include the image directly into the HTML, where that final jibberish is the base64 encoding of the bytes from the PNG file.

I suspect I'm missing some complications that Kurt knows about.

stephenroller commented 5 years ago

I see, I just saw on the other thread the plans to make it a new script. I still disagree but yield to Kurt if he stands by the choice. Apologies for the misunderstanding.

klshuster commented 5 years ago

i think there are two ways to view this issue - that is, whether we are rending images within OR outside the context of chit-chat/conversation. As the most immediate use case involves rendering images within chit-chat, I'll agree with Stephen that we can include, in the current script, rendering an image if it is present in the Message object. Something to keep in mind is that a unique image can span multiple Messages and we'd only want to render this once.

vedantpuri commented 5 years ago

So my initial view of this was that it was unrelated to a chit-chat conversation. I was thinking that we have a bunch of images and our model is predicting the caption. In that case we just need the image and caption and hence a very different HTML which is why I was considering a different script. Could you maybe provide an example of image captioning in the message format you are talking about ? Even a rough sketch drawn by hand would do.

Also what would be the format of the data being processed ?

klshuster commented 5 years ago

A good example would be the image_chat dataset, or any models trained on such a task. The message format would look something like this; more information about that dataset/models are here: https://parl.ai/projects/image_chat/.

For visual context, imagine a conversation on e.g. Facebook Messenger where someone sends an image via chat (and a thumbnail shows up), and the other person responds.

stephenroller commented 5 years ago

It's worth noting that parley frames everything as chats anyway :D

shubhamagarwal92 commented 4 years ago

Hi,

Is there any update for this? I know FB's visdom repo could be a starting point.

However, in my initial experiments with it, I didn't have aligned images and text. See this issue.

I ended up using jupyter notebooks only :D Any suggestions?

Thanks.

stephenroller commented 4 years ago

We'd happily welcome a PR. We currently have other priorities and probably won't come back to this task for some time.

shubhamagarwal92 commented 4 years ago

Sure. If I end up implementing something, I would definitely raise a PR.

For now, I have this PR for downloading image_chat data easily.

vedantpuri commented 4 years ago

Hey @shubhamagarwal92, If you would like to work on this, it might be a good idea to have a look at #2035 (Updates in #2059) as a starting point since I had implemented a similar thing but just for text. Might be useful to integrate this into the previous implementation (saving redundant code). Feel free to improve on the previous implementation if necessary!

github-actions[bot] commented 4 years ago

This issue has not had activity in 30 days. Marking as stale.