FuyaLuo / PearlGAN

Image translation from Nighttime thermal infrared images to Daytime color images.
BSD 2-Clause "Simplified" License
47 stars 3 forks source link

Question about result reproduction #3

Closed Lucky0775 closed 2 years ago

Lucky0775 commented 2 years ago

Hi, thanks for your nice work!

I aming to reproduce the results (mIoU) of semantic segmentation on the kaist dataset with the pretrained model and the re-trained version, but the results are not matched to those reported in the paper and the file of Complete semantic segmentation results.md.

Can you further specify what needs to be done to get close to the results in the paper?

And,are there any other steps that need to be paid attention to in the process of data-preprocessing to form the Datasets and the detection of mIoU?

Best Regards,

FuyaLuo commented 2 years ago

Hi, thanks for your nice work!

I aming to reproduce the results (mIoU) of semantic segmentation on the kaist dataset with the pretrained model and the re-trained version, but the results are not matched to those reported in the paper and the file of Complete semantic segmentation results.md.

Can you further specify what needs to be done to get close to the results in the paper?

And,are there any other steps that need to be paid attention to in the process of data-preprocessing to form the Datasets and the detection of mIoU?

Best Regards,

Flawlessly reproducing the results is extremely difficult for image-to-image (I2I) translation models. Unlike CNN models, the training of I2I translation models is generally unsupervised, and the encoders of both domains may be switched randomly at each iteration. In addition, the random cropping of batchsize to 1 and the sensitivity of the discriminator to samples constrain the stability of training. Moreover, there are other factors such as dropout that can have an impact on GAN model training. However, under reasonable initialization conditions, a similar result can usually be obtained by multiple attempts.

In the proposed method, since the attentional loss is unsupervised, which may further affect the stability of the training, we suggest observing whether the attention maps have a hierarchical distribution along the height at around the 30th epoch of training.

For the pre-processing of the KAIST dataset, please remember to enhance the DC images as described in the paper.

Lucky0775 commented 2 years ago

Hi, thanks for your nice work! I aming to reproduce the results (mIoU) of semantic segmentation on the kaist dataset with the pretrained model and the re-trained version, but the results are not matched to those reported in the paper and the file of Complete semantic segmentation results.md. Can you further specify what needs to be done to get close to the results in the paper? And,are there any other steps that need to be paid attention to in the process of data-preprocessing to form the Datasets and the detection of mIoU? Best Regards,

Flawlessly reproducing the results is extremely difficult for image-to-image (I2I) translation models. Unlike CNN models, the training of I2I translation models is generally unsupervised, and the encoders of both domains may be switched randomly at each iteration. In addition, the random cropping of batchsize to 1 and the sensitivity of the discriminator to samples constrain the stability of training. Moreover, there are other factors such as dropout that can have an impact on GAN model training. However, under reasonable initialization conditions, a similar result can usually be obtained by multiple attempts.

In the proposed method, since the attentional loss is unsupervised, which may further affect the stability of the training, we suggest observing whether the attention maps have a hierarchical distribution along the height at around the 30th epoch of training.

For the pre-processing of the KAIST dataset, please remember to enhance the DC images as described in the paper.

Thank you for your reply!

I have augmented the DC images as described in the paper, then resized and center cropped all images using transforms.Resize() and transforms.CenterCrop() respectively to form my own dataset and finally retrained the model .

The semantic segmentation result (mIoU_9) of both my retraining model and the provided pretrained model on the KAIST dataset is only about 34, which is far from the 43.1 reported in the file of Complete semantic segmentation results.md.

Could this result be caused by the wrong operation of data-preprocessing in the process of making the dataset?

I would love to study this paper and reproduce its results. If it is convenient for you, can you share the relevant datasets used in this article (ie preprocessed kaist dataset and flir dataset) to help me reproduce the results?

Sincerely,

FuyaLuo commented 2 years ago

Hi, thanks for your nice work! I aming to reproduce the results (mIoU) of semantic segmentation on the kaist dataset with the pretrained model and the re-trained version, but the results are not matched to those reported in the paper and the file of Complete semantic segmentation results.md. Can you further specify what needs to be done to get close to the results in the paper? And,are there any other steps that need to be paid attention to in the process of data-preprocessing to form the Datasets and the detection of mIoU? Best Regards,

Flawlessly reproducing the results is extremely difficult for image-to-image (I2I) translation models. Unlike CNN models, the training of I2I translation models is generally unsupervised, and the encoders of both domains may be switched randomly at each iteration. In addition, the random cropping of batchsize to 1 and the sensitivity of the discriminator to samples constrain the stability of training. Moreover, there are other factors such as dropout that can have an impact on GAN model training. However, under reasonable initialization conditions, a similar result can usually be obtained by multiple attempts. In the proposed method, since the attentional loss is unsupervised, which may further affect the stability of the training, we suggest observing whether the attention maps have a hierarchical distribution along the height at around the 30th epoch of training. For the pre-processing of the KAIST dataset, please remember to enhance the DC images as described in the paper.

Thank you for your reply!

I have augmented the DC images as described in the paper, then resized and center cropped all images using transforms.Resize() and transforms.CenterCrop() respectively to form my own dataset and finally retrained the model .

The semantic segmentation result (mIoU_9) of both my retraining model and the provided pretrained model on the KAIST dataset is only about 34, which is far from the 43.1 reported in the file of Complete semantic segmentation results.md.

Could this result be caused by the wrong operation of data-preprocessing in the process of making the dataset?

I would love to study this paper and reproduce its results. If it is convenient for you, can you share the relevant datasets used in this article (ie preprocessed kaist dataset and flir dataset) to help me reproduce the results?

Sincerely,

Did you compare the sampled and processed NTIR test images with the semantic annotation? Our pre-processing of the image is performed by MATLAB, and the semantic segmentation results are shown in the image. res_PearlGAN_KAIST

FuyaLuo commented 2 years ago

Hi, thanks for your nice work! I aming to reproduce the results (mIoU) of semantic segmentation on the kaist dataset with the pretrained model and the re-trained version, but the results are not matched to those reported in the paper and the file of Complete semantic segmentation results.md. Can you further specify what needs to be done to get close to the results in the paper? And,are there any other steps that need to be paid attention to in the process of data-preprocessing to form the Datasets and the detection of mIoU? Best Regards,

Flawlessly reproducing the results is extremely difficult for image-to-image (I2I) translation models. Unlike CNN models, the training of I2I translation models is generally unsupervised, and the encoders of both domains may be switched randomly at each iteration. In addition, the random cropping of batchsize to 1 and the sensitivity of the discriminator to samples constrain the stability of training. Moreover, there are other factors such as dropout that can have an impact on GAN model training. However, under reasonable initialization conditions, a similar result can usually be obtained by multiple attempts. In the proposed method, since the attentional loss is unsupervised, which may further affect the stability of the training, we suggest observing whether the attention maps have a hierarchical distribution along the height at around the 30th epoch of training. For the pre-processing of the KAIST dataset, please remember to enhance the DC images as described in the paper.

Thank you for your reply!

I have augmented the DC images as described in the paper, then resized and center cropped all images using transforms.Resize() and transforms.CenterCrop() respectively to form my own dataset and finally retrained the model .

The semantic segmentation result (mIoU_9) of both my retraining model and the provided pretrained model on the KAIST dataset is only about 34, which is far from the 43.1 reported in the file of Complete semantic segmentation results.md.

Could this result be caused by the wrong operation of data-preprocessing in the process of making the dataset?

I would love to study this paper and reproduce its results. If it is convenient for you, can you share the relevant datasets used in this article (ie preprocessed kaist dataset and flir dataset) to help me reproduce the results?

Sincerely,

If the dataset of your processed segmentation task does not match the annotation, we suggest you test the mAP results of the KAIST dataset using the trained weights. First, resize the original image to 360x288 for NTIR2DC translation, and then upsample the DC image to 640x512 using MATLAB. Finally, the mAP results of pedestrian detection are tested using YOLOv4.

yupan233 commented 8 months ago

我最近开始用kaist数据集,也遇到了这个问题, image 我对原始图片这样操作后,用预训练的权重跑了预测结果对不上,这是可视化的对比, set4_1_I00000 set4_1_I00000_label

感觉是不是kaist数据集和flir数据集预处理时候的放缩尺寸并不都是(500,400)?因为我之前在flir数据集上面这样预处理结果是对得上的。下面是flir的可视化,就通过天空区域可以明显看出flir的是匹配的,但是kaist的就不匹配(预训练权重算的数据也是flir是匹配的,但是kaist就不匹配) FLIR_08863 FLIR_08863_label 不知道帅气的作者大大能不能提供下做预处理的pipeline,十分感谢!!!

yupan233 commented 8 months ago

早上来试了一下,kaist的数据集预处理应该是没有resize到【500,400】这一步的,只需要centercrop就够了,然后结果也和论文对上了 image

FuyaLuo commented 7 months ago

早上来试了一下,kaist的数据集预处理应该是没有resize到【500,400】这一步的,只需要centercrop就够了,然后结果也和论文对上了 image

感谢你的认真核对,能复现就好。KAIST数据集的处理时间较早,当时可能是我个人疏忽检查,导致resize到500x400这一行代码没有取消注释,我很抱歉。我后续会在README里面更正一下KAIST数据集实验的预处理步骤,以及致谢中感谢你的勘误。感谢你的核对!

liting1018 commented 4 months ago

transforms.Resize

请问你贴的这行代码是自己对原始数据集(640*512)进行预处理的代码嘛,我好像没有看到作者大大开源的代码里有这行。