CODAIT / deep-histopath

A deep learning approach to predicting breast tumor proliferation scores for the TUPAC16 challenge
Apache License 2.0
206 stars 88 forks source link

TypeError: map() got an unexpected keyword argument 'num_parallel_calls' && No training configuration found in save file #5

Closed sukruburakcetin closed 7 years ago

sukruburakcetin commented 7 years ago
Traceback (most recent call last):
  File "train_mitoses.py", line 869, in <module>
    args.threads, args.prefetch_batches, args.log_interval, args.checkpoint, args.resume)
  File "train_mitoses.py", line 516, in train
    augmentation, False, threads, prefetch_batches)
  File "train_mitoses.py", line 230, in create_dataset
    num_parallel_calls=threads)
TypeError: map() got an unexpected keyword argument 'num_parallel_calls'

Hi @dusenberrymw, I tried to build&run the freshly updated "train_mitosis.py", but it produces error. The previous version (Fix regularization bug - 1bbb9649b8f56235b41ac65572f783a2d4c59635) was working fine.

In addition to this, when I tried to run "predict_mitosis.py" with the model which was created by using old version of "train_mitosis.py" I faced with the error which is :

/anaconda3/envs/tensorflow-cpu/lib/python3.6/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: ' "

My model file includes:

0.020121_f1_0.94866_loss_1_epoch_model.hdf5 checkpoint global_step_epoch.pickle
model.ckpt.index
train(md)
val(md) args.txt
events.out.tfevents.1507540126.burak-pc
model.ckpt.data-00000-of-00001
model.ckpt.meta

train_mitoses.py As far as I know, it is about not having checkpoints file. I recheck and saw that model configuration'll be saved it in there. I really want to execute whole project for once without any complication. By the way, I just want to say that I am sorry if it was uncomfortable for you to ask so many questions when you are already busy with this highly sensitive and amazing project.

dusenberrymw commented 7 years ago

Hi @Narthyard. Thanks for continuing to try out the code! Luckily, I have the answers for these two issues. For the first issue, I the Dataset API in TensorFlow is still quite new (it will finally be a top-level API in 1.4), and they deprecated an old num_threads parameter and replaced it with num_parallel_calls. I made that change recently, so the code will now require a recent nightly build. Can you update to a nightly build of TensorFlow on your Linux box?

As for the second issue, luckily this is an expected warning, and should not cause any adverse effects (please let me know if it does). The reasoning is that I am using a hybrid Keras + TensorFlow setup where I use the Keras API just for creating a Model object, and then I use the core TensorFlow API for everything else. Because of that, I never actually "compile" the Keras model as would usually be done in a pure Keras setup, and so when I save the model, it does not have the optimizer, loss, etc. as would usually be found in that Model object. Therefore, when it is loaded, it will emit a warning in case you are hoping to continue training with it, but in our case with predict_mitosis.py, we only use that Model object for prediction, which does not require that the model had been compiled. Please let me know if you find that it does indeed cause an issue.

sukruburakcetin commented 7 years ago

Hi @dusenberrymw I trained a new model with the current "train_mitosis.py" and tried to run the model again with the same approach by executing on CPU. I'm still getting the same warnings without getting any desired output. There are no errors, just warnings, but do not know any clue about what is going on there.

This what I get:

/home/burak/anaconda3/envs/tensorflow-cpu/bin/python3.6 /media/burak/HDD/mitosis-detection/deep-histopath/predict_mitoses.py --appName PredictMitosisBurak
Using TensorFlow backend.
2017-10-12 13:19:48.858941: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:48.858957: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:48.858960: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:48.858962: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:48.858964: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/10/12 13:19:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/12 13:19:51 WARN Utils: Your hostname, burak-pc resolves to a loopback address: 127.0.1.1; using 192.168.1.125 instead (on interface enp0s31f6)
17/10/12 13:19:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[Stage 0:>                                                          (0 + 0) / 8]Using TensorFlow backend.
[Stage 0:>                                                          (0 + 4) / 8]Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
2017-10-12 13:19:55.516141: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.516190: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.516202: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.516212: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.516221: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.550041: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.550570: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.550600: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.550611: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.551095: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.552665: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.552711: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.552727: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.553250: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.553418: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.619183: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.619202: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.619205: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.619208: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-12 13:19:55.619210: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
/home/burak/anaconda3/envs/tensorflow-cpu/lib/python3.6/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/home/burak/anaconda3/envs/tensorflow-cpu/lib/python3.6/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/home/burak/anaconda3/envs/tensorflow-cpu/lib/python3.6/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/home/burak/anaconda3/envs/tensorflow-cpu/lib/python3.6/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
[]

Process finished with exit code 0

I changed the parse argument part's default paths to get the right configuration by looking your ipython file:

 parser = argparse.ArgumentParser()
  parser.add_argument("--appName", default="Breast Cancer -- Predict", help="application name")
  parser.add_argument("--slide_path", default=os.path.join("data", "training_image_data", /added/ "TUPAC-TR-500.svs"),
                      help="path to the mitosis data for prediction")
  parser.add_argument("--model_path", default=os.path.join("model", /added/"checkpoints", /added/"0.091732_f1_0.83411_loss_1_epoch_model.hdf5"),
                      help="path to the model file")
  parser.add_argument("--model_name", default="vgg", help="input model type, e.g. vgg, resnet")
  parser.add_argument("--file_suffix", default=".svs", help="file suffix for the input data set, e.g. *.svs")
  parser.add_argument("--node_number", type=int, default=2,
                      help="number of available computing node in the spark cluster")
  parser.add_argument("--gpu_per_node", type=int, default=4,
                      help="number of GPUs on each computing node")
  parser.add_argument("--cpu_per_node", type=int, default=4,
                      help="number of CPUs on each computing node")
  parser.add_argument("--ROI_size", type=int, default=6000, help="size of ROI")
  parser.add_argument("--ROI_overlap", type=int, default=0, help="overlap between ROIs")
  parser.add_argument("--ROI_channel", type=int, default=3, help="number of ROI channel")
  parser.add_argument("--skipROI", default=False, dest='skipROI', action='store_true', help="skip the ROI layer")
  parser.add_argument("--tile_size", type=int, default=64, help="size of tile")
  parser.add_argument("--tile_overlap", type=int, default=0, help="overlap between tiles")
  parser.add_argument("--tile_channel", type=int, default=3, help="channel of tile")
  parser.add_argument("--mitosis_threshold", type=float, default=0.5,
                      help="the threshold for the identification of mitosis")
  parser.add_argument("--batch_size", type=int, default=16, help="batch size for the mitosis prediction")
  parser.add_argument("--onGPU", dest='isGPU', action='store_true',
                      help="run the script on GPU")
  parser.add_argument("--onCPU", dest='isGPU', action='store_false', help="run the script on CPU")
  parser.set_defaults(isGPU=False)
  parser.add_argument("--save_mitosis_locations", default=False, dest="save_mitosis_locations", action='store_true',
                      help="save the locations of the detected mitoses to csv")
  parser.add_argument("--save_mask", default=False, dest="save_mask", action='store_true',
                      help="save the locations of the detected mitoses as a mask image ")
  parser.add_argument("--debug", default=False, dest='isDebug', action='store_true',
                      help="print the debug information")

Now, I'll try execute iphyton file on jupyter notebook.

sukruburakcetin commented 7 years ago

Hi @dusenberrymw Good news for us! :tada: I tried to go hard and figure out why I'm not getting any output without having any errors. My compile was already producing "[]" output, but I didn't know it yet till I looked the ipython file. "predict_mitoses.py" did only produce [] which is same as ipython first part(Predict the mitosis number for each ROI).

dirname = "breastcancer"
zipname = dirname + ".zip"
shutil.make_archive(dirname, 'zip', dirname + "/..", dirname)
sc.addPyFile(zipname)
sc.addPyFile("train_mitoses.py")
sc.addPyFile("preprocess_mitoses.py")
sc.addPyFile("resnet50.py")
In [ ]:
dir = "/home/fei/deep-histopath/deep-histopath/data/training_image_data/"
model_file = '/home/fei/deep-histopath/deep-histopath/model/0.95114_acc_0.58515_loss_530_epoch_model.hdf5'
model_name = 'vgg'
suffix = '*-49*.svs'
node_num = 1
gpu_per_node = 4
partition_num = gpu_per_node * node_num
ROI_size=6000
ROI_overlap=0
ROI_channel = 3
skipROI=False
tile_size=64
tile_overlap=0
tile_channel = 3
batch_size = 128
threshold=0.5
isGPU = True
isDebug = True
save_mitosis_locations=True
save_mask=True
isDebug=True

predict_result_rdd = predict_mitoses(sc, model_path=model_file, model_name = model_name, input_dir=dir, 
                                     file_suffix=suffix, partition_num=partition_num,
                                     ROI_size=ROI_size, ROI_overlap=ROI_overlap, ROI_channel=ROI_channel,
                                     skipROI=skipROI,
                                     tile_size=tile_size, tile_overlap=tile_overlap, tile_channel=tile_channel,
                                     threshold=threshold, isGPU=isGPU, 
                                     save_mitosis_locations=save_mitosis_locations,
                                     save_mask=save_mask,
                                     batch_size=batch_size, isDebug=isDebug)
predict_result_rdd.cache()
Counter({'rr-ram3.softlayer.com': 0})
[(0, 'rr-ram3.softlayer.com'), (1, 'rr-ram3.softlayer.com'), (2, 'rr-ram3.softlayer.com'), (3, 'rr-ram3.softlayer.com')]
{0: 3, 1: 2, 2: 1, 3: 0}

After that, I copied the Experiment Part to my "predict_mitoses.py", now I'm getting the result. However, the output is not expected(it produces 120 for training_file TUPAC-TR-500.svs) as it should be due to fact that I set it as only one epoch. As far as I see, your epoch cycle is 530, so do you mind if I get your trained hdf5 file which are produced with 530 epoch cycle in order to see that I could reach higher accuracy?

By the way, I did not understand the this part:

Counter({'rr-ram3.softlayer.com': 0}) [(0, 'rr-ram3.softlayer.com'), (1, 'rr-ram3.softlayer.com'), (2, 'rr-ram3.softlayer.com'), (3, 'rr-ram3.softlayer.com')] {0: 3, 1: 2, 2: 1, 3: 0} Out[ ]: PythonRDD[2] at RDD at PythonRDD.scala:48

The last question is, how can I get the output of the mask image which refers to detected mitosis points and csv file which includes coordinates of detected mitoses?

dusenberrymw commented 7 years ago

@Narthyard Excellent, I'm glad you've been able to get it working! I don't currently have a good way to share the trained models, but we'll aim to do that in the future. As for the the notebook, some of that stuff was just noise from running it internally -- we'll clean that up. Finally, we're working on the code to output the detected mitosis locations!