Closed yux94 closed 6 years ago
@yux94 Thanks for trying to reproduce my results. I actually feel what you got is already roughly the same as mine. First, I did not have any tricks for preprocessing/postprocessing. Everything is within the codebase. There are some important details when you try to sample the coordinates for training patches, but since I've already provided my sampled coordinates, it doesn't matter anyway. As for testing reproducibility, have you tried to use the ckpt I provided within the codebase to generate the probability map before you trained your own? If you use my ckpt, you should be able to get a probability map of Test_001.tif as this:
I would highly recommend using cmap=jet
to plot the probability map as compared to black/white in your case, which does not quite differentiate numbers around 0.5
In [1]: import numpy as np
In [2]: from matplotlib import pyplot as plt
In [3]: probs_map = np.load('./Test_001.npy')
In [4]: plt.imshow(probs_map.transpose(), vmin=0, vmax=1, cmap='jet')
Out[4]: <matplotlib.image.AxesImage at 0x7f35e994ca58>
In [5]: plt.show()
And I would subjectively argue this figure matches the ground truth annotation pretty well. If you could reproduce this probability map and the corresponding FROC score of ~0.8 (as already achieved by one user), then at least there should be no problems in the postprocessing steps.
Thank you so much for your generous help and suggestions!
After plotting the prob map with cmap=jet
, I got the probability map of Test_001.tif with my ckpt as this:
Maybe I should train again and check my whole process, thank you so much!
@yux94 This one does look worse than my result. In addition to try my ckpt, it would be helpful to also plot your training/validation curve that I can also compare with mine.
@yil8 Many thanks!
When I try to resample the training patches randomly by myself and train the network again, I got the prob map with test_084 like: And below is the prob map with my previous reproduced ckpt: And this is your result: That's very confusing since you have said that one user have already achieved good performance. And I am working on retraining the network again. Besides, it would be very nice if you could provide the detailed process of your sampling with hard mining. (#14 )
@yux94 When I said other users achieved good performance, I mean they used my provided ckpt and achieved 0.8+ FROC score. Your last heatmap plot based on my ckpt also looks good, and I guess if you calculate the FROC score, it probably will be around 0.8 as well. For the training part, again due to non-determinism of GPU convolution, it's almost impossible to achieve numerical identical results for retraining. But I would still suggest you plot your training curve, so that I can get some rough ideas. I'm currently traveling for business trip, and will try to find some time to implement the hard-negative sampling part once I'm back to US.
Thank you so much for your patient and timely reply. This is my training curve with 20 epoch, should I train with more epoch till the curve is stable?
And this is my resampling training curve with 20 epoch.
@yux94 your first curve looks very similar to mine, which converges to ~0.92 valid accuracy. I guess your second curve does not include hard negative examples, thus it converges to higher accuracy. For the curve with hard negative examples, did you train you model using exactly the same config/command I provided in the README?
Yes... Pretty sure. I will check again , many thanks!
@yux94 sorry I couldn't help more on the training side. BTW, what's your FROC score for each case?
Sorry for bothering you again, we have tried to use the .ckpt you provided within the codebase to generate the probability map, and the final FROC score is not satisfing either.
FP | 0.25 | 0.5 | 1 | 2 | 4 | 8 | Avg |
---|---|---|---|---|---|---|---|
NCRF Model | 0.5265 | 0.6106 | 0.6681 | 0.7257 | 0.7743 | 0.8053 | 0.6851 |
So we checked the probability maps. First, we generate coordinates of the detected tumor region with the nms.py. Next, we pick out normal cases (48 out pf 129) which is the false positive, and draw the histogram.
According to this histogram, a good FROC score might be achieved only if the threshold is set to ~0.9.
Sorry , Test_049 and Test_114 are not excluded , thus I got the bad result.
@yux94 did you obtain 0.80+ FROC score after excluding Test_049 and Test_114?
@yux94 "I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually". How did you do that?I need the test GT,too. I just used the ASAP look the tif.But I don't know how to produce the GT.
@yil8 yes, I got the 0.80+ FROC score with your provided.ckpt
after excluding Test_049 and Test_114, but my reproduced result is not satisfying either.
@Hukongtao First you open the .tif
file with the ASAP. Next load the .xml
file and save it as .araw
file. Then open the .tif and .araw
file and save it(If I remember correctly). Here is my another solution by cv2.fillPoly
: https://github.com/yux94/Pathology/blob/master/bin/xml2mask_2.py
Maybe the second method is more convenient.
@yux94 嗯嗯,代码可以使用。我把生成的结果转化成黑白图像得到的是图很大,但是前景只有很小的一块诶。和你在上面展示的不一样
@yux94 大神方便留个QQ或者微信么,有些问题还是想向您请教。或者您加我QQ:1821141394
Excuse me, but I have one question again. Did you first train the resnet18 and then finetune the model with the crf model?
@yux94 Not quite sure what do you mean your "reproduced result is not satisfying either" exactly. FROC of 0.8+ is pretty good as far as I know. Do you have some specific examples? I trained resnet18 together with crf from scratch without finetune.
When I tried to reproduce your codes, there are less-than-perfect results. For example, below is the raw test_001 tiff And after the whole training with the ResNet18-CRF, I got the test prob map result as: while the ground-truth mask is something like:(since the camelyon16 organizers didn't provide the test GT with the format of tiff anymore, I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually.)
And I have just followed your test steps and evaluated the average FROC score for the whole test set, and got this:
However, the result is not at all satisfying.
And is there any other trick in your preprocess, postprocess, or the training process?
Here is the prob map of test_026: