FutabaSakuraXD / Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification

54 stars 6 forks source link

the implementation is quite different from the paper #2

Open nessessence opened 3 years ago

nessessence commented 3 years ago

In the code, I notice that you use the grayscale images as an input to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images then fed it as an input to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.

Could you please clarify the reason behind the difference.

Thank you in advance!

FutabaSakuraXD commented 3 years ago

Figure.3 illustrates that Ei takes images from both modals as input, please check it again : )

And, practically I found greyscale transformation can slightly improve the performance.

As for the absence for VML loss, its role (regularization on zi and zv) could be implicitly performed by the VCD, thus it is omitted.

--------------原始邮件-------------- 发件人:"Ness Essence @.>; 发送时间:2021年9月26日(星期天) 晚上11:03 收件人:"FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification" @.>; 抄送:"Subscribed @.***>; 主题:[FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification] the implementation is quite different from the paper (#2)

In the code, I notice that you use the gray-scale images as inputs to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images as inputs to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.

Could you please clarify the reason behind the difference.

Thank you in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

FutabaSakuraXD commented 3 years ago

There's one mistake in my last e-mail, it is Es that takes images from both modals, not Ei or Ev : )

--------------原始邮件-------------- 发件人:"Ness Essence @.>; 发送时间:2021年9月26日(星期天) 晚上11:03 收件人:"FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification" @.>; 抄送:"Subscribed @.***>; 主题:[FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification] the implementation is quite different from the paper (#2)

In the code, I notice that you use the gray-scale images as inputs to Ei. Isn't it supposed to be infrared images? and why did you mix the infrared and RGB images as inputs to Ev? and you didn't even use VML loss. There is significant difference between the code and the paper.

Could you please clarify the reason behind the difference.

Thank you in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

nessessence commented 3 years ago

Thank you, appreciate the quick response.

https://github.com/FutabaSakuraXD/Farewell-to-Mutual-Information-Variational-Distiilation-for-Cross-Modal-Person-Re-identification/blob/ae7c0187d2ed36e6ed5109c2e0476e7f17bc8ce2/reid/models/newresnet.py#L73

Ev here is "self.RGB_backbone" and Ei is "self.IR_backbone", right?
I think Ev takes inputs from both modals because "x" here is a batch of both infrared and RGB images. and Ei also takes these images (from both modals) but applying grayscale transformation. Am I correct? If so, then Ev takes both infrared and RGB images already, and the input of Ei can not be considered as infrared modal anymore, It's just grayscale images (which might seem similar to the infrared though). and you also feed the infrared images to Ev just to increase the number of samples.

Do I understand it correctly? If so, could you please clarify the reason behind this change? or It's just about the empirical result? // There's nothing wrong about the method and the result is great. I'm just curious how you came up with this.

Thank you :)