Open mayone23 opened 6 months ago
It's so we can use a coco-pretrained network without needing any single-modality fine tuning.
Okay. So for thermal and RGB, you are directly extracting features from the COCO-pre-trained network and then training with the fusion network, right?
I was going through the code, and it seems the model classify for 90 classes for flir dataset? Why so