Open nessessence opened 3 years ago
Figure.3 illustrates that Ei takes images from both modals as input, please check it again : )
And, practically I found greyscale transformation can slightly improve the performance.
As for the absence for VML loss, its role (regularization on zi and zv) could be implicitly performed by the VCD, thus it is omitted.
In the code, I notice that you use the gray-scale images as inputs to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images as inputs to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.
Could you please clarify the reason behind the difference.
Thank you in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
There's one mistake in my last e-mail, it is Es that takes images from both modals, not Ei or Ev : )
In the code, I notice that you use the gray-scale images as inputs to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images as inputs to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.
Could you please clarify the reason behind the difference.
Thank you in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Thank you, appreciate the quick response.
Ev here is "self.RGB_backbone" and Ei is "self.IR_backbone", right?
I think Ev takes inputs from both modals because "x" here is a batch of both infrared and RGB images.
and Ei also takes these images (from both modals) but applying grayscale transformation.
Am I correct? If so, then Ev takes both infrared and RGB images already, and the input of Ei can not be considered as infrared modal anymore, It's just grayscale images (which might seem similar to the infrared though). and you also feed the infrared images to Ev just to increase the number of samples.
Do I understand it correctly? If so, could you please clarify the reason behind this change? or It's just about the empirical result? // There's nothing wrong about the method and the result is great. I'm just curious how you came up with this.
Thank you :)
In the code, I notice that you use the grayscale images as an input to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images then fed it as an input to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.
Could you please clarify the reason behind the difference.
Thank you in advance!