EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision
Apache License 2.0
182 stars 11 forks source link

Some questions about LongVA #8

Closed linhaojia13 closed 5 days ago

linhaojia13 commented 6 days ago

This work is fantastic and has been very inspiring to me. I have a few questions:

  1. Pre-training data: The paper seems to indicate that you converted Slimpajama to PDF to construct the data. Is that correct?
  2. Instruction data and codes: After pretraining, what instruction data are used for fine-tuning LongVA? Also, when do you plan to release the instruction tuning code?
jzhang38 commented 6 days ago

The paper seems to indicate that you converted Slimpajama to PDF to construct the data. Is that correct?

No. In the long-context training stage, we train on text from Slimpajama only. But converting Slimpajam to PDF is an interesting idea. We've also thought about it but have not tried it yet.

After pretraining, what instruction data are used for fine-tuning LongVA

Same as Llava-1.6

When do you plan to release the instruction tuning code

See https://github.com/EvolvingLMMs-Lab/LongVA/issues/3

linhaojia13 commented 5 days ago

Thank you very much!