fangyuanmao / PID

Code for PID: Physics-Informed Diffusion Model for Infrared Image Generation
MIT License
57 stars 5 forks source link

test TeVNet_KAIST pretrained model #9

Open mzalaki00 opened 2 weeks ago

mzalaki00 commented 2 weeks ago

I'm using two commands to get results from the model, but both produce images that don't resemble IR images. !!! i put rgb images is in indir or --image-dir

1.python scripts/rgb2ir_vqf8.py --steps 200 --indir ./image --outdir ./result --config ./configs/latent-diffusion/kaist512-vqf8.yaml --ddim_eta 0.0 --checkpoint ./pretrained/TeV Net_KAIST/epoch_950.pth

2.python ./TeVNet/test.py --image-dir ./image --output-dir ./result --smp_model Unet --smp_encoder resnet18 --vnums 4 --weights-file ./pretrained/TeVNet_KAIST/epoch_950.pth

What might be wrong with it??!?

fangyuanmao commented 1 week ago

TeVNet is used to decompose the infrared images into T, e and V components, not used for generating infrared images. You can download the checkpoint of PID and modify the command like "python scripts/rgb2ir_vqf8.py xxxxxxx --checkpoint /path/to/PID_checkpoint".

mzalaki00 commented 1 week ago

Its output appears improved but yet deviates from the ground truth. I use yaml config --> flir512-vqf8.yaml with this content: model: base_learning_rate: 1.0e-06 target: ldm.models.diffusion.ddpm_tev.LatentDiffusion params: load_only_unet: True tevloss_weight_rec: 50 tevloss_weight_tev: 50 pixel_tev: true vnums: 4 linear_start: 0.0015 linear_end: 0.0205 log_every_t: 100 timesteps: 1000 loss_type: l1 first_stage_key: image cond_stage_key: conditional image_size: 64 channels: 4 concat_mode: true monitor: val/loss_simple_ema cond_stage_trainable: true unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: image_size: 64 in_channels: 7 out_channels: 4 model_channels: 128 attention_resolutions:

data: target: main.DataModuleFromConfig params: batch_size: 12 num_workers: 4 wrap: false train: target: ldm.data.FLIRv1512.FLIRTrain params: size: 512 validation: target: ldm.data.FLIRv1512.FLIRVal params: size: 512

Are these settings correct, particularly the two checkpoint paths specified above?

fangyuanmao commented 1 week ago

The used FLIR dataset is "Flir thermal dataset version 1.3". We recommend you to load the KAIST checkpoint to test the RGB images we provided.

mzalaki00 commented 1 week ago

Thanks but I don't care about version of Flir dataset, i talk about models and yaml file config. i used --config ./configs/latent-diffusion/kaist512-vqf8.yaml --ddim_eta 0.0 --checkpoint ./pretrained/PID_KAIST/epoch=000235-step=000059999.ckpt with PID_KAIST

in both yaml file we have tevnet checkpoint params not PID, i use every checkpoint but it doesn't make output better!

fangyuanmao commented 1 week ago
  1. This sentence is to verify that if you have used the right datasets to evaluate:

    The used FLIR dataset is "Flir thermal dataset version 1.3". We recommend you to load the KAIST checkpoint to test the RGB images we provided.

Because your yaml file is about FLIR:

Its output appears improved but yet deviates from the ground truth. I use yaml config --> flir512-vqf8.yaml with this content: model: base_learning_rate: 1.0e-06 target: ldm.models.diffusion.ddpm_tev.LatentDiffusion params: load_only_unet: True tevloss_weight_rec: 50 tevloss_weight_tev: 50 pixel_tev: true vnums: 4 linear_start: 0.0015 linear_end: 0.0205 log_every_t: 100 timesteps: 1000 loss_type: l1 first_stage_key: image cond_stage_key: conditional image_size: 64 channels: 4 concat_mode: true monitor: val/loss_simple_ema cond_stage_trainable: true unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: image_size: 64 in_channels: 7 out_channels: 4 model_channels: 128 attention_resolutions: - 8 - 4 - 2 num_res_blocks: 2 channel_mult: - 1 - 4 - 8 num_head_channels: 8 first_stage_config: target: ldm.models.autoencoder.VQModelInterface params: ckpt_path: "./pretrained/vqf8_pretrained/model.ckpt" embed_dim: 4 n_embed: 16384 monitor: val/rec_loss ddconfig: double_z: false z_channels: 4 resolution: 256 in_channels: 3 out_ch: 3 ch: 128 ch_mult: - 1 - 2 - 2 - 4 num_res_blocks: 2 attn_resolutions: - 32 dropout: 0 lossconfig: target: torch.nn.Identity cond_stage_config: target: ldm.modules.encoders.modules.SpatialRescaler params: n_stages: 3 method: bicubic in_channels: 3 out_channels: 3 tev_net_config: target: ldm.modules.HADARNet.modules.HADARNet params: in_channels: 3 out_channels: 6 smp_model: Unet smp_encoder: resnet18 ckpt_path: "./pretrained/TeVNet_FLIR/epoch_1000.pth"

data: target: main.DataModuleFromConfig params: batch_size: 12 num_workers: 4 wrap: false train: target: ldm.data.FLIRv1512.FLIRTrain params: size: 512 validation: target: ldm.data.FLIRv1512.FLIRVal params: size: 512

Are these settings correct, particularly the two checkpoint paths specified above?

  1. It seems that you mention TeVNet many times. But in our paper, we clearly claim that TeVNet is irrelevant with inference.

  2. In our paper, we have released the quantitative results and qualitative results of different datasets. Please refer them.