This paper talks about long context length for a language model which is extended to be a vision-language model. I wonder why is it called World Model. It is not obvious in the paper. This paper seems focus more on the long context and evaluation of related retrieval ability with little discussion on the world modelling.
I wonder is there any specific discovery on the model ability that improves along with long context training. Does it make it more robust against prompt variations? More robust on reasoning ? More semantically riched in concept representations ? Better ontological/hierarchical learning towards the meaning ?
Will be curious on hearing more about the findings from the authors.
This paper talks about long context length for a language model which is extended to be a vision-language model. I wonder why is it called World Model. It is not obvious in the paper. This paper seems focus more on the long context and evaluation of related retrieval ability with little discussion on the world modelling.
I wonder is there any specific discovery on the model ability that improves along with long context training. Does it make it more robust against prompt variations? More robust on reasoning ? More semantically riched in concept representations ? Better ontological/hierarchical learning towards the meaning ?
Will be curious on hearing more about the findings from the authors.
Thanks a lot for any insights in advance : )