baidu-research / NCRF

Cancer metastasis detection with neural conditional random field (NCRF)
Apache License 2.0
757 stars 184 forks source link

Some issues about the reproducing results #12

Closed yux94 closed 6 years ago

yux94 commented 6 years ago

When I tried to reproduce your codes, there are less-than-perfect results. For example, below is the raw test_001 tiff tif_raw_convert_img And after the whole training with the ResNet18-CRF, I got the test prob map result as: probmap_convert_img while the ground-truth mask is something like:(since the camelyon16 organizers didn't provide the test GT with the format of tiff anymore, I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually.) npy_mask_convert_img

And I have just followed your test steps and evaluated the average FROC score for the whole test set, and got this: froc_npy

However, the result is not at all satisfying.

And is there any other trick in your preprocess, postprocess, or the training process?

Here is the prob map of test_026:

probmap_convert_img_026

yil8 commented 6 years ago

@yux94 Thanks for trying to reproduce my results. I actually feel what you got is already roughly the same as mine. First, I did not have any tricks for preprocessing/postprocessing. Everything is within the codebase. There are some important details when you try to sample the coordinates for training patches, but since I've already provided my sampled coordinates, it doesn't matter anyway. As for testing reproducibility, have you tried to use the ckpt I provided within the codebase to generate the probability map before you trained your own? If you use my ckpt, you should be able to get a probability map of Test_001.tif as this:

screen shot 2018-07-23 at 11 12 26 am

I would highly recommend using cmap=jet to plot the probability map as compared to black/white in your case, which does not quite differentiate numbers around 0.5

In [1]: import numpy as np

In [2]: from matplotlib import pyplot as plt

In [3]: probs_map = np.load('./Test_001.npy')

In [4]: plt.imshow(probs_map.transpose(), vmin=0, vmax=1, cmap='jet')
Out[4]: <matplotlib.image.AxesImage at 0x7f35e994ca58>

In [5]: plt.show()

And I would subjectively argue this figure matches the ground truth annotation pretty well. If you could reproduce this probability map and the corresponding FROC score of ~0.8 (as already achieved by one user), then at least there should be no problems in the postprocessing steps.

yux94 commented 6 years ago

Thank you so much for your generous help and suggestions! After plotting the prob map with cmap=jet, I got the probability map of Test_001.tif with my ckpt as this: probmap_convert_img_001_plot

Maybe I should train again and check my whole process, thank you so much!

yil8 commented 6 years ago

@yux94 This one does look worse than my result. In addition to try my ckpt, it would be helpful to also plot your training/validation curve that I can also compare with mine.

yux94 commented 6 years ago

@yil8 Many thanks!

yux94 commented 6 years ago

When I try to resample the training patches randomly by myself and train the network again, I got the prob map with test_084 like: probmap_convert_img_084_plot_resample_jet And below is the prob map with my previous reproduced ckpt: probmap_convert_img_084_plot_reproduce_jet And this is your result: probmap_convert_img_084_plot_rawbaidu_jet That's very confusing since you have said that one user have already achieved good performance. And I am working on retraining the network again. Besides, it would be very nice if you could provide the detailed process of your sampling with hard mining. (#14 )

yil8 commented 6 years ago

@yux94 When I said other users achieved good performance, I mean they used my provided ckpt and achieved 0.8+ FROC score. Your last heatmap plot based on my ckpt also looks good, and I guess if you calculate the FROC score, it probably will be around 0.8 as well. For the training part, again due to non-determinism of GPU convolution, it's almost impossible to achieve numerical identical results for retraining. But I would still suggest you plot your training curve, so that I can get some rough ideas. I'm currently traveling for business trip, and will try to find some time to implement the hard-negative sampling part once I'm back to US.

yux94 commented 6 years ago

Thank you so much for your patient and timely reply. This is my training curve with 20 epoch, should I train with more epoch till the curve is stable? getimage

yux94 commented 6 years ago

getimage 1 And this is my resampling training curve with 20 epoch.

yil8 commented 6 years ago

@yux94 your first curve looks very similar to mine, which converges to ~0.92 valid accuracy. I guess your second curve does not include hard negative examples, thus it converges to higher accuracy. For the curve with hard negative examples, did you train you model using exactly the same config/command I provided in the README?

yux94 commented 6 years ago

Yes... Pretty sure. I will check again , many thanks!

yil8 commented 6 years ago

@yux94 sorry I couldn't help more on the training side. BTW, what's your FROC score for each case?

yux94 commented 6 years ago

Sorry for bothering you again, we have tried to use the .ckpt you provided within the codebase to generate the probability map, and the final FROC score is not satisfing either.

FP 0.25 0.5 1 2 4 8 Avg
NCRF Model 0.5265 0.6106 0.6681 0.7257 0.7743 0.8053 0.6851

So we checked the probability maps. First, we generate coordinates of the detected tumor region with the nms.py. Next, we pick out normal cases (48 out pf 129) which is the false positive, and draw the histogram. getimage 3

According to this histogram, a good FROC score might be achieved only if the threshold is set to ~0.9.

yux94 commented 6 years ago

Sorry , Test_049 and Test_114 are not excluded , thus I got the bad result.

yil8 commented 6 years ago

@yux94 did you obtain 0.80+ FROC score after excluding Test_049 and Test_114?

Hukongtao commented 6 years ago

@yux94 "I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually". How did you do that?I need the test GT,too. I just used the ASAP look the tif.But I don't know how to produce the GT.

yux94 commented 6 years ago

@yil8 yes, I got the 0.80+ FROC score with your provided.ckpt after excluding Test_049 and Test_114, but my reproduced result is not satisfying either.

yux94 commented 6 years ago

@Hukongtao First you open the .tif file with the ASAP. Next load the .xml file and save it as .araw file. Then open the .tif and .araw file and save it(If I remember correctly). Here is my another solution by cv2.fillPoly: https://github.com/yux94/Pathology/blob/master/bin/xml2mask_2.py Maybe the second method is more convenient.

Hukongtao commented 6 years ago

@yux94 嗯嗯,代码可以使用。我把生成的结果转化成黑白图像得到的是图很大,但是前景只有很小的一块诶。和你在上面展示的不一样

Hukongtao commented 6 years ago

@yux94 大神方便留个QQ或者微信么,有些问题还是想向您请教。或者您加我QQ:1821141394

yux94 commented 6 years ago

Excuse me, but I have one question again. Did you first train the resnet18 and then finetune the model with the crf model?

yil8 commented 6 years ago

@yux94 Not quite sure what do you mean your "reproduced result is not satisfying either" exactly. FROC of 0.8+ is pretty good as far as I know. Do you have some specific examples? I trained resnet18 together with crf from scratch without finetune.