apple / tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Other
3.66k stars 308 forks source link

MTLTextureDescriptor has width (9223372036854776029) greater than the maximum allowed size of 16384. #141

Open antoinedray opened 3 years ago

antoinedray commented 3 years ago

Training a the following Tensorflow Network on Jupyter:

Model: "WMH"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 224, 224, 64) 1792        input_1[0][0]                    
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, 224, 224, 64) 36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, 112, 112, 64) 0           block1_conv2[0][0]               
__________________________________________________________________________________________________
block2_conv1 (Conv2D)           (None, 112, 112, 128 73856       block1_pool[0][0]                
__________________________________________________________________________________________________
block2_conv2 (Conv2D)           (None, 112, 112, 128 147584      block2_conv1[0][0]               
__________________________________________________________________________________________________
block2_pool (MaxPooling2D)      (None, 56, 56, 128)  0           block2_conv2[0][0]               
__________________________________________________________________________________________________
block3_conv1 (Conv2D)           (None, 56, 56, 256)  295168      block2_pool[0][0]                
__________________________________________________________________________________________________
block3_conv2 (Conv2D)           (None, 56, 56, 256)  590080      block3_conv1[0][0]               
__________________________________________________________________________________________________
block3_conv3 (Conv2D)           (None, 56, 56, 256)  590080      block3_conv2[0][0]               
__________________________________________________________________________________________________
block3_pool (MaxPooling2D)      (None, 28, 28, 256)  0           block3_conv3[0][0]               
__________________________________________________________________________________________________
block4_conv1 (Conv2D)           (None, 28, 28, 512)  1180160     block3_pool[0][0]                
__________________________________________________________________________________________________
block4_conv2 (Conv2D)           (None, 28, 28, 512)  2359808     block4_conv1[0][0]               
__________________________________________________________________________________________________
block4_conv3 (Conv2D)           (None, 28, 28, 512)  2359808     block4_conv2[0][0]               
__________________________________________________________________________________________________
sub_block2_conv1 (Conv2D)       (None, 112, 112, 16) 18448       block2_conv2[0][0]               
__________________________________________________________________________________________________
sub_block3_conv1 (Conv2D)       (None, 56, 56, 16)   36880       block3_conv3[0][0]               
__________________________________________________________________________________________________
sub_block4_conv1 (Conv2D)       (None, 28, 28, 16)   73744       block4_conv3[0][0]               
__________________________________________________________________________________________________
sub_block2_up-conv1 (Conv2DTran (None, 224, 224, 16) 2320        sub_block2_conv1[0][0]           
__________________________________________________________________________________________________
sub_block3_up-conv1 (Conv2DTran (None, 224, 224, 16) 2320        sub_block3_conv1[0][0]           
__________________________________________________________________________________________________
sub_block4_up-conv1 (Conv2DTran (None, 224, 224, 16) 2320        sub_block4_conv1[0][0]           
__________________________________________________________________________________________________
sub_block1_conv1 (Conv2D)       (None, 224, 224, 16) 9232        block1_conv2[0][0]               
__________________________________________________________________________________________________
dropout (Dropout)               (None, 224, 224, 16) 0           sub_block2_up-conv1[0][0]        
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 224, 224, 16) 0           sub_block3_up-conv1[0][0]        
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 224, 224, 16) 0           sub_block4_up-conv1[0][0]        
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 224, 224, 64) 0           sub_block1_conv1[0][0]           
                                                                 dropout[0][0]                    
                                                                 dropout_1[0][0]                  
                                                                 dropout_2[0][0]                  
__________________________________________________________________________________________________
conv1x1_softmax (Conv2D)        (None, 224, 224, 1)  65          concatenate[0][0]                
==================================================================================================
Total params: 7,780,593
Trainable params: 145,329
Non-trainable params: 7,635,264
__________________________________________________________________________________________________

Gives a kernel crash with the following error:

2021-01-28 12:46:55.964822: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-28 12:46:55.981590: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
-[MTLTextureDescriptorInternal validateWithDevice:]:1248: failed assertion `Texture Descriptor Validation
MTLTextureDescriptor has width (9223372036854776029) greater than the maximum allowed size of 16384.
MTLTextureDescriptor has height (9223372036854776029) greater than the maximum allowed size of 16384.
'
anna-tikhonova commented 3 years ago

Thank you very much for reporting this issue. Could you please provide a reproducible test case, so we can investigate this issue locally?

antoinedray commented 3 years ago

I am running the following Neural Network with the following data shapes:

X train shape: (2864, 224, 224, 3)
X test shape: (716, 224, 224, 3)
Y train shape: (2864, 224, 224, 1)
Y test shape: (716, 224, 224, 1)

The NN is defined as below:

import warnings
import datetime
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
import tensorboard
from tensorflow import keras
from sklearn.model_selection import train_test_split
import matplotlib as mpl
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats

input_shape = (224, 224, 3)
VGG = tf.keras.applications.vgg16.VGG16(
    include_top=True,
    input_shape=input_shape,
    weights='imagenet'
)
VGG.trainable = False

block1 = VGG.get_layer("block1_conv2").output
block2 = VGG.get_layer("block2_conv2").output
block3 = VGG.get_layer("block3_conv3").output
block4 = VGG.get_layer("block4_conv3").output
sub_block1 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', kernel_initializer='he_normal',
                                 activation='relu', name='sub_block1_conv1')(block1)
sub_block2 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', kernel_initializer='he_normal',
                                 activation='relu', name='sub_block2_conv1')(block2)
sub_block2 = keras.layers.Conv2DTranspose(filters=16, kernel_size=3, strides=(2, 2), padding='same',
                                          name='sub_block2_up-conv1')(sub_block2)
sub_block2 = keras.layers.Dropout(0.1)(sub_block2)
sub_block3 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', kernel_initializer='he_normal',
                                 activation='relu', name='sub_block3_conv1')(block3)
sub_block3 = keras.layers.Conv2DTranspose(filters=16, kernel_size=3, strides=(4, 4), padding='same',
                                          name='sub_block3_up-conv1')(sub_block3)
sub_block3 = keras.layers.Dropout(0.1)(sub_block3)
sub_block4 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same', kernel_initializer='he_normal',
                                 activation='relu', name='sub_block4_conv1')(block4)
sub_block4 = keras.layers.Conv2DTranspose(filters=16, kernel_size=3, strides=(8, 8), padding='same',
                                          name='sub_block4_up-conv1')(sub_block4)
sub_block4 = keras.layers.Dropout(0.1)(sub_block4)
concatenate = keras.layers.Concatenate(axis=3, name='concatenate')([sub_block1, sub_block2, sub_block3, sub_block4])
output = keras.layers.Conv2D(filters=1, kernel_size=1, padding='same', kernel_initializer='he_normal',
                                 activation='softmax', name='conv1x1_softmax')(concatenate)
WMH = keras.models.Model(inputs=VGG.input, outputs=output, name='WMH')

step = tf.Variable(0, trainable=True, name='Opt_Var')
lr_schedule = keras.optimizers.schedules.PiecewiseConstantDecay([50000], [1e-8, 1e-10])
mm_schedule = keras.optimizers.schedules.PiecewiseConstantDecay([50000], [0.99, 0.999])
learning_rate = lr_schedule(step)
momentum = mm_schedule(step)
optimizer_sgdw = tfa.optimizers.SGDW(
    learning_rate=learning_rate,
    momentum=momentum,
    weight_decay=0.0005
)

WMH.compile(
    optimizer=optimizer_sgdw,
    loss=keras.losses.BinaryCrossentropy(),
    metrics=[
        keras.metrics.Recall(name='recall'),
        keras.metrics.Accuracy(name='accuracy'),
        keras.metrics.MeanIoU(num_classes=3, name='dice')
    ],
)

epochs = 10
batch_size = 8
log_dir = "logs/simple/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
init = tf.compat.v1.global_variables_initializer()
with tf.compat.v1.Session() as sess:
    init.run()
    WMH.fit(X_train, y_train,
            batch_size=batch_size,
            validation_data=(X_test, y_test),
            epochs=epochs,
            callbacks=[tensorboard_callback]) # lr_callback

I encountered the following issue on the following machine:

macOS Big Sur Version 11.1
MacBook Pro (15-inch, 2016)
Processor 2.9 GHz Quad-Core Intel Core i7
Memory 16 GB 2133 MHz LPDDR3
Graphics Radeon Pro 460 4 GB
                Intel HD Graphics 530 1536 MB
anna-tikhonova commented 3 years ago

Thank you very much for providing a reproducible test case. We will investigate and report back.