PeterJackNaylor / DRFNS

This repository contains the code necessary in order to reproduce the work contained in the submitted paper: "Segmentation of Nuclei in Histopathology Images by deep regression of the distance map".
MIT License
47 stars 13 forks source link

Problems when training with the real data #7

Closed GekFreeman closed 4 years ago

GekFreeman commented 4 years ago

Hi, thanks for your excellent work! Recently I am reproducing your work, here are some problems during my implementation: a.the experiment results are lower than the paper image how can I modify the code to reproduce the result? b.the training code in DRFNS/src_RealData/UNet.py is something weird

if SPLIT == "train":
        model.train(DG_TEST)

Why isn't "DG_TRAIN"?

PeterJackNaylor commented 4 years ago

Hi,

Great! I'm happy to help, the results you report are indeed low. I did believe the code was good but something has surely broken. It should have just reproduce more or less the results. Which are branch are you clone from? I would recommend working with the branch named 'review'. It is a more updated version of the code and the RCNN code (but it hasn't been completely migrated in the repository).

Could you provide me with some probability maps that the nextflow code produces? They would be in './out_RDS/Test/', from the bladder sections for example.

To improve the results, you can tweak some of the available hyperparameters in the nextflow file, for example:

LEARNING_RATE = [0.001, 0.0001, 0.00001]
FEATURES = [16, 32, 64] //
WEIGHT_DECAY = [0, 0.00005, 0.005, 0.5]

For your final question, the object model as the method train that takes in input the test data generator. In the earlier lines, you will see that model takes in input a tfrecord. To accelerate the training speed it is (maybe used) be better to use tensorflow records. The DG_TEST on that line is pretty useless, it should be set to None, but maybe the code will break.

GekFreeman commented 4 years ago

Thank you so much for your timely help! As you said, the model takes in input a tfrecord, but I find that the data split in TFRecords.py is not the same as the paper says. In addition, there is not a file folder named "validation" in the dataset I downloaded by the download_data.sh

    if options.split == "train":
        TEST_PATIENT = ["testbreast", "testliver", "testkidney", "testprostate",
                        "bladder", "colorectal", "stomach", "validation"]
        TRANSFORM_LIST = transform_list_test
    elif options.split == "validation":
        options.split = "test"
        TEST_PATIENT = ["validation"]
        TRANSFORM_LIST = transform_list_test
        SIZE = options.size_test

    elif options.split == "test":
        TEST_PATIENT = ["testbreast", "testliver", "testkidney", "testprostate",
                        "bladder", "colorectal", "stomach"]
        TRANSFORM_LIST = transform_list_test
        SIZE = options.size_test
    elif options.split == "fulltrain":
        TEST_PATIENT = ["testbreast", "testliver", "testkidney", "testprostate",
                        "bladder", "colorectal", "stomach"]
        TRANSFORM_LIST = transform_list_test
PeterJackNaylor commented 4 years ago

Hi, You are welcome,for the time.

Could you explicit why this differs from the paper?

The data generators that deals with the data work in this manner, you give it a path and a list of test patients. If you say the generator is for training, it will generate all the samples it can find in the given path excluding the test patient. If you say the generator is for test, well it will generate all the samples that match the test patient in the given path.

Thanks for your comment about the validation, I will updated it to take it into account. As validation, we took one sample from each training organ, in order to not modify the test set.

So for train: We exclude "testbreast", "testliver", "testkidney", "testprostate", "bladder", "colorectal", "stomach", "validation" For validation, we just want "validation" For test, we will generate "testbreast", "testliver", "testkidney", "testprostate", "bladder", "colorectal", "stomach".

And in the paper we wrote: 'For this, we split DS2 into three sets: training, validation and test set. The test set is the same as the one used in [32].'

GekFreeman commented 4 years ago

Thanks! I thought you chose the same data sets for train and test This does solve my problem, thanks again!