Closed ebgoldstein closed 2 years ago
Note that this fix will break previous Gym code because of function renaming
I am attempting to test the new loss functions with 1-band imagery. I fetched new updates from main
and updated doodleverse_utlis==0.0.5
. After running train_model.py
, I receive the following error:
(gym) cbodine@filfy-Thelio-Massive:~/PythonRepos/segmentation_gym$ python train_model.py
2022-10-04 08:27:48.844598: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/mnt/md0/SynologyDrive/Modeling/Substrate_BOU-LEA_5_remapHard/03_forModelTraining/dataset
/mnt/md0/SynologyDrive/Modeling/Substrate_BOU-LEA_5_remapHard/03_forModelTraining/config/substrate_20221004_v7.json
Using GPU
Using single GPU device
Version: 2.10.0
Eager mode: True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Making new directory for example model outputs: /mnt/md0/SynologyDrive/Modeling/Substrate_BOU-LEA_5_remapHard/03_forModelTraining/modelOut
MODE "all": using all augmented and non-augmented files
2022-10-04 08:28:46.128074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-04 08:28:46.862641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14597 MB memory: -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:65:00.0, compute capability: 7.5
369
221
.....................................
Creating and compiling model ...
.....................................
Training model ...
Epoch 1: LearningRateScheduler setting learning rate to 1e-07.
Epoch 1/100
Traceback (most recent call last):
File "/home/cbodine/PythonRepos/segmentation_gym/train_model.py", line 721, in <module>
history = model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
File "/home/cbodine/anaconda3/envs/gym/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_filekmiiz2fb.py", line 15, in tf__train_function
retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
File "/tmp/__autograph_generated_file0km3dfxo.py", line 27, in tf__mean_iou
ag__.for_stmt(ag__.converted_call(ag__.ld(range), (ag__.ld(nclasses),), None, fscope), None, loop_body, get_state, set_state, ('iousum',), {'iterate_names': 'index'})
File "/tmp/__autograph_generated_file0km3dfxo.py", line 25, in loop_body
iousum += ag__.converted_call(basic_iou, (y_true[:, :, :, index], y_pred[:, :, :, index]), None, fscope)
ValueError: in user code:
File "/home/cbodine/anaconda3/envs/gym/lib/python3.10/site-packages/keras/engine/training.py", line 1160, in train_function *
return step_function(self, iterator)
File "/home/cbodine/anaconda3/envs/gym/lib/python3.10/site-packages/doodleverse_utils/model_imports.py", line 989, in mean_iou *
iousum += basic_iou(y_true[:,:,:,index], y_pred[:,:,:,index])
ValueError: slice index 4 of dimension 3 out of bounds. for '{{node strided_slice_11}} = StridedSlice[Index=DT_INT32, T=DT_FLOAT, begin_mask=7, ellipsis_mask=0, end_mask=7, new_axis_mask=0, shrink_axis_mask=8](one_hot, strided_slice_11/stack, strided_slice_11/stack_1, strided_slice_11/stack_2)' with input shapes: [?,512,512,4], [4], [4], [4] and with computed input tensors: input[1] = <0 0 0 4>, input[2] = <0 0 0 5>, input[3] = <1 1 1 1>.
Here are the new Dice and IoU..
so in
gym
, themodel.compile()
calls need to be adjusted:for IoU as metric:
iou_multi(NCLASSES)
for Dice as metric:
dice_multi(NCLASSES)
for dice as loss:
dice_coef_loss(NCLASSES)
All smooth / epsilon is set to
10e-6
This does not have loss weighting in the dice yet.. we can add it into this PR, or the next PR...