Open anonymous-atom opened 1 month ago
Sure! I’d be happy to help if I can. However, I’m not sure what your specific doubt is—could you clarify?
Thank you very much for that! I have several doubts regarding training and interleaved generation. I will post them in a while.
Could it be possible to collaborate over email or some of your preferred way to reach ? Just want to keep the work private before it's published.
My email: ksharma370@gatech.edu
Thanks Again!
my email: swu@u.nus.edu
Got it Thanks! Will keep posting updates and doubts here
Hi @ChocoWu Let me know if I am wrong, we have to train the Stage 3 before stage 2 ? if so , then your latest code requires "multimodal_projector" checkpoints, but that we only get if we train using stage 2 first ?
I did not understand your question. In the new version's codebase, there is no "multimodal_projector." Please refer to the latest README for details on training.
Hi @ChocoWu, really sorry to reach again this way, but had a doubt regarding training:
During training of the decoder, either of compute_image_loss, compute_video_loss, compute_audio_loss is None, is this because during training the dataloader passes data of 1 modality through forward pass ? Is this the expected behaviour ?
because the loss fluctuates alot:
=== compute_image_loss : (30.875, 30.875, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (None, None, None) ====== {'loss': 38.7826, 'grad_norm': 47.2336196899414, 'learning_rate': 6.505283282093679e-06, 'epoch': 0.0}
=== compute_image_loss : (None, None, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (2.421875, 2.421875, 0.3259783685207367) ====== {'loss': 6.0065, 'grad_norm': 111.6504898071289, 'learning_rate': 6.1332575503138975e-06, 'epoch': 0.0}
@ChocoWu Also can you kindly release the latest NExT-GPT weights ?
Hi, I'm sorry for the late response. I'm too busy these two weeks to chase a deadline.
Yes. this is an expected behavior. You might also try mixing data from different modalities to see how it performs.
Hi @ChocoWu, really sorry to reach again this way, but had a doubt regarding training:
During training of the decoder, either of compute_image_loss, compute_video_loss, compute_audio_loss is None, is this because during training the dataloader passes data of 1 modality through forward pass ? Is this the expected behaviour ?
because the loss fluctuates alot:
=== compute_image_loss : (30.875, 30.875, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (None, None, None) ====== {'loss': 38.7826, 'grad_norm': 47.2336196899414, 'learning_rate': 6.505283282093679e-06, 'epoch': 0.0}
=== compute_image_loss : (None, None, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (2.421875, 2.421875, 0.3259783685207367) ====== {'loss': 6.0065, 'grad_norm': 111.6504898071289, 'learning_rate': 6.1332575503138975e-06, 'epoch': 0.0}
Thanks for your response. Wishing you luck for CVPR if that's what you are chasing for!
On Sat, Nov 2, 2024, 20:45 Shengqiong Wu @.***> wrote:
Hi, I'm sorry for the late response. I'm too busy these two weeks to chase a deadline.
Yes. this is an expected behavior. You might also try mixing data from different modalities to see how it performs.
Hi @ChocoWu https://github.com/ChocoWu, really sorry to reach again this way, but had a doubt regarding training:
During training of the decoder, either of compute_image_loss, compute_video_loss, compute_audio_loss is None, is this because during training the dataloader passes data of 1 modality through forward pass ? Is this the expected behaviour ?
because the loss fluctuates alot:
=== compute_image_loss : (30.875, 30.875, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (None, None, None) ====== {'loss': 38.7826, 'grad_norm': 47.2336196899414, 'learning_rate': 6.505283282093679e-06, 'epoch': 0.0}
=== compute_image_loss : (None, None, None) ====== === compute_video_loss : (None, None, None) ====== ===compute_audio_loss : (2.421875, 2.421875, 0.3259783685207367) ====== {'loss': 6.0065, 'grad_norm': 111.6504898071289, 'learning_rate': 6.1332575503138975e-06, 'epoch': 0.0}
— Reply to this email directly, view it on GitHub https://github.com/anonymous-atom/OpenGPT4o/issues/1#issuecomment-2453243115, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZTQIOSY7E4GCUGB2WEUETZ6VW25AVCNFSM6AAAAABPTRCIZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTGI2DGMJRGU . You are receiving this because you authored the thread.Message ID: @.***>
Hi @https://github.com/ChocoWu @ChocoWu
Sorry for any inconvencience brought to you like this. I am research student currently at Georgia Tech, working on Mulitmodals and had been working on around NExT-GPT along with advisor from Reka.ai and NVIDIA for long now.
Could it be possible if you can help us with the doubt regarding mulitmodals here? That will mean a lot!
Your work on NExT-GPT Is really nice.
Thank Again