Closed jiawen-zhu closed 1 year ago
Yes, you are correct. When we feed conversation history into the model, it will output the multi-turn conversation result. Currently, we do not have multi-turn conversation data containing <SEG>
token. The VQA data (e.g., llava-instruction-150k) contains multi-turn conversation, but it does not involve <SEG>
token and the segmentation task. As a result, the model can handle some simple multi-turn cases currently as shown in Fig.1 of the paper. We are working on improving such ability of LISA.
Hi~ Nice work! The paper mentioned that LISA has the capability of multi-turn conversation. I would like to know how LISA get this capability? Is the training instruction data containing multi-turn conversation? Will the previous inputs and outputs be fed into MLLM again in subsequent conversations?