🖼️ Attend to You: Personalized Image Captioning with Context Sequence Memory Networks. In CVPR, 2017. Expanded : Towards Personalized Image Captioning via Multimodal Memory Networks. In IEEE TPAMI, 2018.
It seems that for one image to be predicted, the context information is from all tags of one user, not from the previous tags before the image is uploaded. Did I miss something?
It seems that for one image to be predicted, the context information is from all tags of one user, not from the previous tags before the image is uploaded. Did I miss something?