Linfeng-Tang / SwinFusion

This is official Pytorch implementation of "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer"
188 stars 19 forks source link

Using SwinFusion #33

Open Adam11072000 opened 3 months ago

Adam11072000 commented 3 months ago

Hey guys,

I tried out with your models and i see that the pre-trained models all receive a single channel of input. My main goal is to use your model with ImageNet and IR masks and produce an RGB picture. how can i do that?

here is a snapshot of what i did: image

Linfeng-Tang commented 3 months ago

Can you provide me with a visible image and its infrared counterpart?

Adam1107 @.***> 于2024年7月21日周日 00:09写道:

Hey guys,

I tried out with your models and i see that the pre-trained models all receive a single channel of input. My main goal is to use your model with ImageNet and IR masks and produce an RGB picture. how can i do that?

here is a snapshot of what i did: image.png (view on web) https://github.com/user-attachments/assets/9cf16962-feda-4f38-bf19-3ba4c43558e4

— Reply to this email directly, view it on GitHub https://github.com/Linfeng-Tang/SwinFusion/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOFH5IIGCC7LB6IW4TD6AZDZNKDVNAVCNFSM6AAAAABLGBQGPCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZDAOJVHA4TAMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Adam11072000 commented 3 months ago

well, my goal is to use the image, and an infrared mask, not nessecarily the same image with a counterpart. the IR part shape is a (3, H, W) and the RGB shape is (3, H, W) too. what we see here in the error above is that the weights of the trained model (at least the first layer), expects a single channel of input, whereas the code gives the ability to exploit RGB.

Adam11072000 commented 3 months ago

i accessed the weights when loading the pretrained model, and the shape of the first layer is (30, 1 3, 3), where 30 is the output channels, 1 is the input channels (this guy right here should be 3).

ccatian commented 3 months ago

i accessed the weights when loading the pretrained model, and the shape of the first layer is (30, 1 3, 3), where 30 is the output channels, 1 is the input channels (this guy right here should be 3).

Hi, have you resolved the issue of fusing two RGB color images to generate a color image at the end?