Cecile-hi / Multimodal-Learning-with-Alternating-Unimodal-Adaptation

Multimodal Learning Method MLA for CVPR 2024
36 stars 2 forks source link

How do you set hyperparameters? #3

Open Jinx630 opened 5 months ago

Jinx630 commented 5 months ago

Hello, I wonder how you set hyperparameters, such as learning rate, batch size, and the number of epochs. We just obtained a 68 score on the Crame-D dataset using the command in the readme.

python main.py --train --ckpt_path ckpt --gpu_ids 0 --batch_size 64 --lorb base --modulation Normal --epochs 100 --dataset CREMAD --gs_flag

Cecile-hi commented 5 months ago

Hi, I think maybe you can try --batch_size 16 and specify --av_alpha 0.55, I used this hyper and get a 77.69 acc, like shown below: thumbnail_0A9AC1C2@BE2C3A2B 8103F96500000000 thumbnail_B13CC1BB@146D5A30 8103F96500000000

shicaiwei123 commented 4 months ago

hello, with the following command,

python main.py --train --ckpt_path ckpt --gpu_ids 0 --batch_size 64 --lorb base --modulation Normal --epochs 100 --dataset CREMAD --gs_flag

I also got the result of 68.8.

When I set the batch size as 16 and av_alpha as 0.55, I got the result of 73.6, which is still far from the result in the paper.

Are there any else parameters?

hubaak commented 4 months ago

@Cecile-hi Hi, I set batch_size=16, epoch=200, lr_decay_step=150, and other hyperparameters to default. I got a score of 0.800 on the CREMA-D dataset, which is as good as expected. Here is the figure of the accuracy curve: image However, when I used the same parameter for uni-modal learning(named late fusion in your paper), I got an even higher accuracy of 0.812, much higher than 0.663 in Table 1. image And the corresponding curve is here: image I tried other parameters but found that for the same set of parameters, uni-modal learning always seemed to have a competitive performance to your method. So I am a little bit confused by the results now. I'd appreciate it if you could tell me the detailed hyperparameter setting of Table 1 :).

chenj1031 commented 2 months ago

Hello, how did I obtain the JSONL file for handling food101 in the extract_token.py file? The dataset I downloaded does not have a suitable file

yuner2001 commented 1 month ago

@hubaak Hi! I’d like to know how you implemented the uni-modal late fusion algorithm. Did you simply add a single-layer MLP after the ResNet? And did this achieve a score of 81?

hubaak commented 1 month ago

@hubaak Hi! I’d like to know how you implemented the uni-modal late fusion algorithm. Did you simply add a single-layer MLP after the ResNet? And did this achieve a score of 81?

Hi, I added logits for the audio model(ResNet and 1-layer fc) and visual model(ResNet and 1-layer fc) to get the prediction of uni-modal. To achieve a score of 81, the key is the setting of hyperparameters. I trained the uni-modal model with hyperparameters of batch_size=16, epoch=200, lr_decay_step=150 (others are defaulted) for CREMA-D and got 81.

yuner2001 commented 1 month ago

@hubaak thx a lot bro