dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
742 stars 44 forks source link

training loss in stage-1 #88

Open Nastu-Ho opened 7 months ago

Nastu-Ho commented 7 months ago

In the first stage of training, the final loss was around 2. Is this normal?

EchoDreamer commented 3 months ago

In the first stage of training, the final loss was around 2. Is this normal?

Hi, I’m currently trying to reproduce the results from the LlamaVid paper, but I’m having some difficulty because I don’t have access to the WebVid dataset. Would you be able to guide me on how to download or access the WebVid dataset? I’d really appreciate any help you could offer. Thank you so much in advance!