astra-vision / CoMoGAN

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.
Apache License 2.0
181 stars 14 forks source link

Questions about physical models #11

Closed DZY-cybe closed 2 years ago

DZY-cybe commented 2 years ago

Dear author,

Thans for your impressive work,I'm very honored to ask you a few questions. First,which physical model can I choose if I want to do RGB image 2 Infrared image translation?Is there a filter like the one described in the paper that would help me do this?Second,I think I should use a linear model, so what should I modify?I am looking forward to your advice.

Thank you!

fabvio commented 2 years ago

Hello, and thanks for your interest :) I'm a bit confused by your setup. The physical model is used to guide a continuous transformation on an unordered target domain. Before identifying the model, you should identify which target characteristic you would like to encode on the continuous manifold. Even if there is no obvious continuous setup, CoMoGAN can always regulate some characteristics like color or brightness (Fig. 12c in the paper). So, what do you want to encode continuously?

DZY-cybe commented 2 years ago

Hi!Thanks for your reply !I'm sorry that my question bothered you,I think the bridge between the RGB and IR image domains is the wavelength of light that the camera utilizes.According to the different wavelengths, we divide visible and infrared.So, I think phi here should be wavelength.Right now, I'm missing a simple model of physical guidance. For example, in the linear iPhone → DSLR experiment, using phi as the Gaussian filter kernel, I think the wavelength may also be involved in some kind of grayscale process. Do you have any suggestions?

fabvio commented 2 years ago

You didn't bother me at all, I'm glad to help! I'm just trying to understand how to provide guidance on which model to use in the most effective way. So, let me recap the problem setting.

  1. Your source domain is RGB, so images acquired in the visible spectrum.
  2. Your target domain is IR, so images acquired in the infrared. You suppose to have a wide varity of IR images with different wavelength.
  3. Your objective is perform a continuous RGB→IR i2i transformation in such a way that you'll be able to translate controlling the wavelength of the resulting images.

If the above is correct, I have a question. What is the variability of your IR dataset? Can you display some example images which represent the different appearances that you'd like to control? If the dataset is private, you can also write to me in private.

DZY-cybe commented 2 years ago

Thank you very much for your reply!I am very glad and happy to have your help!The dataset I am using is from :https://www.flir.com/oem/adas/adas-dataset-form/.The name of the dataset is FLIR.I will show you a few RGB and IR images from this dataset. FLIR_00004 FLIR_00004 The first file are RGB images, the second file are IR images.Unfortunately, by consulting the parameters of the camera, I can only roughly determine that the infrared camera works in the far-infrared band, which is between 8um and 14um. I don't have images in other bands.

DZY-cybe commented 2 years ago

FLIR_00003 FLIR_00004 FLIR_00006 FLIR_00007 FLIR_00003 FLIR_00004 FLIR_00006 FLIR_00007 More examples you can see,they are paired.

fabvio commented 2 years ago

As far as I understood the wavelength of the IR camera is unknown, but since you work with only one camera all the images share the same wavelength. Am I correct?

DZY-cybe commented 2 years ago

Yes, you're right.All infrared images were taken with an infrared camera, and all visible images were taken with an optical camera.

fabvio commented 2 years ago

So, the problem is that even if CoMoGAN can reorganize an unordered manifold, it requires the target domain to encompass all possible desired output styles (see paper). If in your target dataset you don't have variability, but you simply have one style, the network training will collapse to that specific style, in this case, the unique IR that you have.

I think your setup is interesting though. Maybe you can get inspiration from the paper and somehow benefit from synthetic data since you still have a paired dataset which may help in learning. I would consider learning 1. the continuous appearance of RGB->IR from synthetic data and 2. the realistic appearance of data from paired images in this dataset. To be more clear, something similar to a fusion between CoMoGAN and AnalogicalGAN [1]. This will require heavy modifications of the code.

Closing the issue for now since the code is not immediately applicable to your problem, feel free to continue discussing and/or reopen it if I got something wrong and you believe it's feasible.

[1] Analogical Image Translation for Fog Generation, R. Gong et al., AAAI 2021

DZY-cybe commented 2 years ago

Thank you very much for your reply, I know what the problem is, what should my physical model look like if my dataset is synthetic and contains IR data at different wavelengths?

fabvio commented 2 years ago

It's difficult to say without seeing any image at different wavelength. I suppose some model could be designed by looking at different semantic classes (different materials reflect light in different ways).

DZY-cybe commented 2 years ago

Thank you very much!I get it!