aetherAI / whole-slide-cnn

This repository provides scripts to reproduce the results in the paper "An annotation-free whole-slide training approach to pathological classification of lung cancer types by deep learning".
Other
61 stars 13 forks source link

training problem by using HMS #4

Open TargosLi opened 2 years ago

TargosLi commented 2 years ago

I use Tensorflow version is 2.4.1 training config: RESULT_DIR: "result_wholeslide_1x" MODEL_PATH: "${RESULT_DIR}/model.h5" LOAD_MODEL_BEFORE_TRAIN: False CONFIG_RECORD_PATH: "${RESULT_DIR}/config.yaml"

USE_MIXED_PRECISION: True USE_HMS: True USE_MIL: False

TRAIN_CSV_PATH: "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Train_SKIN_TCGA.csv" VAL_CSV_PATH: "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Val_SKIN_TCGA.csv" TEST_CSV_PATH: "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Test_SKIN_TCGA.csv" SLIDE_DIR: "/mnt/data/RawImages/HE_SKIN_WSI_TCGA/" SLIDE_FILE_EXTENSION: ".svs" SLIDE_READER: "openslide" RESIZE_RATIO: 0.05 # 1x magnification for 20x WSIs INPUT_SIZE: [21500, 21500, 3]

MODEL: "fixup_resnet50" NUM_CLASSES: 3 BATCH_SIZE: 1 EPOCHS: 200 NUM_UPDATES_PER_EPOCH: 100 INIT_LEARNING_RATE: 0.00002 POOL_USE: "gmp" REDUCE_LR_FACTOR: 0.1 REDUCE_LR_PATIENCE: 24 TIME_RECORD_PATH: "${RESULT_DIR}/time_record.csv" TEST_TIME_RECORD_PATH: "${RESULT_DIR}/test_time_record.csv"

MIL_PATCH_SIZE: NULL MIL_INFER_BATCH_SIZE: NULL MIL_USE_EM: False MIL_K: NULL MIL_SKIP_WHITE: NULL

TEST_RESULT_PATH: "${RESULT_DIR}/test_result.json" ENABLE_VIZ: False VIZ_SIZE: [2150, 2150] VIZ_FOLDER: "${RESULT_DIR}/viz"

DEBUG_PATH: NULL

I tried this config, and Traceback (most recent call last): File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/train.py", line 128, in model = build_model( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 129, in build_model conv_block = get_conv_block(input_shape) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 85, in get_conv_block conv_block = model_fn( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 26, in "fixup_resnet50": lambda args, kwargs: ResNet50( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/resnet.py", line 557, in ResNet50 return ResNet(stack_fn, False, True, 'resnet50', File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/resnet.py", line 436, in ResNet x = _ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(x) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 206, in call res = super(HugeLayerWrapper, self).call(inputs, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in call outputs = call_fn(inputs, args, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 267, in call output_tensor_list = self._do_padding(inputs, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 517, in _do_padding self.layer.compute_output_shape(self._get_shape(inputs)), File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/venv/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 2868, in compute_output_shape if input_shape[1] is not None: IndexError: list index out of range

Process finished with exit code 1

TargosLi commented 2 years ago

/home/de1119151/PycharmProjects/whole-slide-cnn-main/venv/bin/python /home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/train.py 2021-09-30 08:33:59.142871: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 No protocol specified

Config

{ "RESULT_DIR": "result_wholeslide_1x", "MODEL_PATH": "result_wholeslide_1x/model.h5", "LOAD_MODEL_BEFORE_TRAIN": false, "CONFIG_RECORD_PATH": "result_wholeslide_1x/config.yaml", "USE_MIXED_PRECISION": true, "USE_HMS": true, "USE_MIL": false, "TRAIN_CSV_PATH": "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Train_SKIN_TCGA.csv", "VAL_CSV_PATH": "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Val_SKIN_TCGA.csv", "TEST_CSV_PATH": "/home/de1119151/PycharmProjects/whole-slide-cnn-main/slide_data_targos/Test_SKIN_TCGA.csv", "SLIDE_DIR": "/mnt/data/RawImages/HE_SKIN_WSI_TCGA/", "SLIDE_FILE_EXTENSION": ".svs", "SLIDE_READER": "openslide", "RESIZE_RATIO": 0.05, "INPUT_SIZE": [ 21500, 21500, 3 ], "MODEL": "fixup_resnet50", "NUM_CLASSES": 3, "BATCH_SIZE": 1, "EPOCHS": 200, "NUM_UPDATES_PER_EPOCH": 100, "INIT_LEARNING_RATE": 2e-05, "POOL_USE": "gmp", "REDUCE_LR_FACTOR": 0.1, "REDUCE_LR_PATIENCE": 24, "TIME_RECORD_PATH": "result_wholeslide_1x/time_record.csv", "TEST_TIME_RECORD_PATH": "result_wholeslide_1x/test_time_record.csv", "TEST_RESULT_PATH": "result_wholeslide_1x/test_result.json", "ENABLE_VIZ": false, "VIZ_SIZE": [ 2150, 2150 ], "VIZ_FOLDER": "result_wholeslide_1x/viz", "DEBUG_PATH": null } 2021-09-30 08:34:00.865431: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-09-30 08:34:00.867343: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:00.868142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-09-30 08:34:00.890023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-09-30 08:34:00.890048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 08:34:00.892048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-09-30 08:34:00.892087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-09-30 08:34:00.892943: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-09-30 08:34:00.893123: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-09-30 08:34:00.895095: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-09-30 08:34:00.895550: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-09-30 08:34:00.895660: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-09-30 08:34:00.898047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-09-30 08:34:00.898070: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 08:34:01.310261: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-09-30 08:34:01.310293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-09-30 08:34:01.310299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-09-30 08:34:01.313915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 128748 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6) WARNING:tensorflow:From /home/de1119151/PycharmProjects/whole-slide-cnn-main/tensorflow_huge_model_support/tf_keras.py:29: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-09-30 08:34:01.324147: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:01.325364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-09-30 08:34:01.325427: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.326527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: pciBusID: 0000:21:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-09-30 08:34:01.326587: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.327667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties: pciBusID: 0000:4b:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-09-30 08:34:01.327722: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.328798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties: pciBusID: 0000:4c:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s 2021-09-30 08:34:01.328815: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-09-30 08:34:01.328875: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-09-30 08:34:01.328889: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-09-30 08:34:01.328901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-09-30 08:34:01.328913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-09-30 08:34:01.328926: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-09-30 08:34:01.328938: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-09-30 08:34:01.328950: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-09-30 08:34:01.330106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.331218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.332327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

Initializing datasets

Training dataset contains 660 slides. Validation dataset contains 252 slides.

Initializing the model

2021-09-30 08:34:01.334535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.335647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.336760: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.337833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1, 2, 3 2021-09-30 08:34:01.337875: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:01.339004: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:01.339059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.340133: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:01.340185: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-30 08:34:01.341256: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-09-30 08:34:01.341309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero Traceback (most recent call last): File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/train.py", line 128, in model = build_model( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 129, in build_model conv_block = get_conv_block(input_shape) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 85, in get_conv_block conv_block = model_fn( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/model.py", line 26, in "fixup_resnet50": lambda args, kwargs: ResNet50( File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/resnet.py", line 557, in ResNet50 return ResNet(stack_fn, False, True, 'resnet50', File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/resnet.py", line 436, in ResNet x = _ZeroPadding2D(padding=((3, 3), (3, 3)), name='conv1_pad')(x) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 206, in call res = super(HugeLayerWrapper, self).call(inputs, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1012, in call outputs = call_fn(inputs, args, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 267, in call output_tensor_list = self._do_padding(inputs, kwargs) File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/whole_slide_cnn/huge_layer_wrapper.py", line 517, in _do_padding self.layer.compute_output_shape(self._get_shape(inputs)), File "/home/de1119151/PycharmProjects/whole-slide-cnn-main/venv/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 2868, in compute_output_shape if input_shape[1] is not None: IndexError: list index out of range

Process finished with exit code 1