gicheonkang / dan-visdial

✨ Official PyTorch Implementation for EMNLP'19 Paper, "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"
https://www.aclweb.org/anthology/D19-1209
MIT License
45 stars 10 forks source link

Making custom inferences #7

Closed puneet-kr closed 4 years ago

puneet-kr commented 4 years ago

Thank you for sharing your code. I am trying to reproduce and understand it. It will be of great help if you could kindly provide some information on how to get custom inferences, i.e., providing an image, query, and dialog; and generating the answer. Thank you.

gicheonkang commented 4 years ago

Hi @puneet-kr, thank you for your interest.

General procedures for inference are as follows:

  1. load the pre-trained model
  2. embed inputs to vector (image, query, dialog history, answer candidates)
  3. feed the embeddings to the model
  4. transform the model output to human readable output

If you need to get custom inferences, pre-processing steps for embedding vector are required ! Embedding for image inputs --> Faster R-CNN Embedding for text inputs --> word tokens to pre-defined numbers using word to index dictionary