Open Hongbin98 opened 1 year ago
I am willing to see that you have already noticed this issue. :)
I just tried to train the otter-7b on the 'LACONV' splits. And there may be a minor issue that why do we need to train so much steps (a total of 126405 iterations) to achieve a model, even if it seems to be converged in 200 steps?
Note that I also set the num_epochs=9, following the default settings.
Hi! I'm also super interested in the LA-interleaved dataset but seems to have missed a lot of details about it. Could someone give any help about the meaning of the abbreviations? Are there any information about how exactly each part/version is constructed? I do notice a few places in the paper (https://arxiv.org/pdf/2306.05425.pdf) referring to the appendix for the details but all appendix sections seem to be unrelated.
Hi! I'm also super interested in the LA-interleaved dataset but seems to have missed a lot of details about it. Could someone give any help about the meaning of the abbreviations? Are there any information about how exactly each part/version is constructed? I do notice a few places in the paper (https://arxiv.org/pdf/2306.05425.pdf) referring to the appendix for the details but all appendix sections seem to be unrelated.
Hi ziyi,
Generally, LA-interleaved is build by retrieving in-context examples for each (Q,A,I) triplets in LLaVA complex reasoning data, building a multi-modal in-context learning format.
The motivation behind building this LA-interleaved is that W\we experimentally found that without such data, instruct-tuned flamingo would loss its in-context learning ability.
There are two ways to retrieve in-context examples, given a query (Q(uestion),A(nswer),I(mage))
Thanks for the response @ZhangYuanhan-AI, could you also describe the LAConv and LADD splits?
As state in the latest paper, ' Trained on the LA task, the model exhibits exceptional scene comprehension, reasoning abilities, and multi-round conversation capabilities.'
I am very interested in this part and want to train the otter based on 'LA'. However, in the LA.zip, there are several '_instructions.json' and '_train.json' files. So I am confused to select the corresponding files to train my model.
Could you provide the training command with me? Thanks~