RupertLuo / Valley

The official repository of "Video assistant towards large language model makes everything easy"
199 stars 13 forks source link

Gradient issue #13

Open TonyXuQAQ opened 12 months ago

TonyXuQAQ commented 12 months ago

Hi, after going through the training code, it seems that the gradient is not properly backpropagated. It seems that all projector layers mm_projectorare called within torch.no_grad (i.e., call_1, call_2). If so, it means the projector layer is not trained at all, right? Is this a typo in the released code or an error?

RupertLuo commented 12 months ago

Can you share the error output and training configuration file?

TonyXuQAQ commented 12 months ago

There is no error. I just used the raw code of this repo. I mean, the projector mm_projector layer seems not been trained properly in valley/model/valley.py. All mm_projector are wrapped in torch.no_grad so that the projector will not be trained, since the gradient is blocked within torch.no_grad.

RupertLuo commented 11 months ago
image

In the file train.py, You can set whether need to update the projector.

TonyXuQAQ commented 11 months ago

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector. Screenshot from 2023-09-21 10-50-57

feymanpriv commented 11 months ago

But projectors are wrapped inside torch.no_grad. So the gradient cannot pass the projector, i.e., the projector is not trained. And you did not use this layer elsewhere. I wonder how you trained this projector. Screenshot from 2023-09-21 10-50-57

@TonyXuQAQ I find the projector is not wrapped inside 'torch.no_grad' in the original code of this repo, as follows: image in [https://github.com/RupertLuo/Valley/blob/8da73a9551cd9ce520c47f7c3f508fdfc387f4f8/valley/model/valley.py]. I guess the "bug" is caused by reorganizing the codes. And the projector should be outside the 'torch.no_grad' as the released models are trained with tuning projector.

TonyXuQAQ commented 11 months ago

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

TonyXuQAQ commented 11 months ago

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

RupertLuo commented 11 months ago

Thanks for the information.

During finetuning, I also noticed that your current version code cannot load VideoChat-instruct-11K normally. Because LLaVA-instruct-150K's label is organized as "{'human":... "gpt":...}", but VideoChat-instruct-11K's label is organized as "{'q':..., 'a':...}". These two datasets have different label formats. But your code did not do format transformation. I guess you missed the label pre-processing code.

I don't know how, based on your llama-2-pretrain weights, I finetuned valley on the above two datasets and the results are very bad. I will refer to the early commits of this repo for debugging.

LLaVA-instruct-150k should be able to load. For videochat-11k, you need to convert the format to LLaVA-instruct-150k.

RupertLuo commented 11 months ago

So may I know which commit is used to train the provided valley-2-7b? I just want to re-implement the performance of the provided checkpoints

Thank you for your continued attention to this project. I will synchronize it to the code that can be perfectly trained as soon as possible.