EPFL-VILAB / MultiMAE

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022
https://multimae.epfl.ch
Other
544 stars 58 forks source link

how do i do multimodal classification? #17

Closed ucalyptus2 closed 1 year ago

ucalyptus2 commented 1 year ago

I want to use RGB + something else as input modalities and do finetuning cls.

Let me know what files and yml i should look at for changing accordingly.

cc: @roman-bachmann @dmizr

roman-bachmann commented 1 year ago

Hi @forkbabu ,

We would suggest you to modify run_finetuning_cls.py by taking inspiration from run_finetuning_depth.py on how to use multiple modalities for transfers. Their respective config files can be found in cfgs/finetune/cls and cfgs/finetune/depth.

When modifying the classification finetuning script to be multi-modal, make sure to also modify the various augmentations like cutmix, mixup, cropping, flipping, etc., to support multiple modalities, since these augmentations are usually crucial to get a good performance on ImageNet. Alternatively, you may also just skip these augmentations to simplify things.

Best, Roman