Bihaqo / t3f

Tensor Train decomposition on TensorFlow
https://t3f.readthedocs.io/en/latest/index.html
MIT License
218 stars 56 forks source link

Input to reshape is a tensor with 524288 values, but the requested shape has 4096 #174

Closed Silk760 closed 5 years ago

Silk760 commented 5 years ago

I am using VGG-16 for cifar-10 I am trying to use your library in order to reduce the number of parameters in fully connected layers

I am using Keras model = vgg16.VGG16(include_top=True, weights=None,input_shape=(32,32,3),input_tensor= None ,pooling='max',classes=10)

model.load_weights("cifar10vgg_final_weight.h5")

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              2101248   
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 10)                40970     
=================================================================
Total params: 33,638,218
Trainable params: 33,638,218
Non-trainable params: 0

using this code to remove the layers :

def pop_layer(model):
    if not model.outputs:
        raise Exception('Sequential model cannot be popped: model is empty.')

    model.layers.pop()
    if not model.layers:
        model.outputs = []
        model.inbound_nodes = []
        model.outbound_nodes = []
    else:
        model.layers[-1].outbound_nodes = []
        model.outputs = [model.layers[-1].output]
    model.built = True
for i in range(4):
    pop_layer(model)

After removing 
x = model.outputs
x = Flatten()(x)
x= t3f.nn.KerasDense(input_dims=[4,4,4,8], output_dims=[8,8,8,8], tt_rank=16 ,activation='relu' ,bias_initializer= 1e-7)(x)
x= t3f.nn.KerasDense(input_dims=[8,8,8,8], output_dims=[8,8,8,8], tt_rank=16 ,activation='relu' ,bias_initializer= 1e-7)(x)
predictions = Dense(10, activation="softmax")(x)

model_final = Model(input = model.input, output = predictions)
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 512)               0         
_________________________________________________________________
keras_dense_7 (KerasDense)   (None, 4096)              22016     
_________________________________________________________________
keras_dense_8 (KerasDense)   (None, 4096)              38912     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                40970     
=================================================================
Total params: 14,816,586
Trainable params: 14,816,586
Non-trainable params: 0
_________________________________________________________________

When I am trying to train this new model show me the error which is coming from multiply 4069 * batch_size is 128

Input to reshape is a tensor with 524288 values, but the requested shape has 4096
     [[{{node keras_dense_7/t3f_matmul/Reshape_4}} = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _class=["loc:@training_2/SGD/gradients/keras_dense_7/t3f_matmul/Reshape_4_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](keras_dense_7/t3f_matmul/einsum_3/transpose_2, keras_dense_7/t3f_matmul/Reshape_4/shape)]]

I am not really sure what cause this problem, the same method work when I build the model from scratch not using VGG keras model, I am not really sure why this is happening

Bihaqo commented 5 years ago

Hi!

It seems that the flatten node you removed and then added back misbehaves: Flatten()(model.outputs) has shape (B * 512) instead of (B, 512).

Try removing 3 last layers instead of 4 and thus avoid adding your own flatten, it seems to work for me.

Bihaqo commented 5 years ago

BTW, when you figure it out, I would be very happy to see you VGG example as a pull request to the tutorials section :)

Silk760 commented 5 years ago

Hi , I think I figure out the problem which is model.ouputs for some reason does not behave as supposed to be. So I use model.get_layer('block5_pool').output

x= model.get_layer('block5_pool').output
x = Flatten(name='flatten')(x)
x= t3f.nn.KerasDense(input_dims=[4,8,4,4],output_dims=[8,8,8,8], tt_rank=16 ,activation='relu' ,kernel_initializer='glorot',bias_initializer= 1e-7)(x)
x= t3f.nn.KerasDense(input_dims=[8,8,8,8], output_dims=[8,8,8,8], tt_rank=16 ,activation='relu' ,kernel_initializer='glorot',bias_initializer= 1e-7)(x)
predictions = Dense(10, activation="softmax")(x)

model_final = Model(inputs = model.input, outputs = predictions)

However after trying to train the model, I think the model is stuck is not able to train at all , give me a random accuracy which is only 10%. I think the train done on cpu instead of the GPU even I assigned my model to be trained on gpu

Epoch 1/100
 - 547s - loss: nan - acc: 0.1049 - val_loss: nan - val_acc: 0.1000
Epoch 2/100
 - 548s - loss: nan - acc: 0.0998 - val_loss: nan - val_acc: 0.1000
Epoch 3/100
 - 548s - loss: nan - acc: 0.1002 - val_loss: nan - val_acc: 0.1000
Epoch 4/100
 - 569s - loss: nan - acc: 0.1003 - val_loss: nan - val_acc: 0.1000
Epoch 5/100
 - 577s - loss: nan - acc: 0.0999 - val_loss: nan - val_acc: 0.1000

I have read your tutorial , and I am not really sure if I should do this step , and If I should can you please tell me why

W = model.trainable_weights[0]
print(W)
Wtt = t3f.to_tt_matrix(W, shape=[[7, 4, 7, 4], [5, 5, 5, 5]], max_tt_rank=16)
print(Wtt)

cores = sess.run(Wtt.tt_cores)
other_params = model.get_weights()[1:]

finally I think I am not able to train the model on the GPU , even I wrote this line of code

with K.tf.device('/gpu:0'):
 model 

So, I read in docs the library support the GPU but when I am learn my model and assign the learning on the gpu does not do it , and learn the graph in CPU instead.

And yes I will put all the models after I finish them.

KhrulkovV commented 5 years ago

Hey @MohammedAlnemari , I've written an example similar to yours from scratch (https://gist.github.com/KhrulkovV/c34442ff7fcdf45010a16371bd87a65b) and it seems to be working (I haven't waited for a long time, but at first epoch, accuracy went to something about 40%). I think since the VGG16 architecture is adapted for 224x224 images you need to chop more layers (I removed everything after block3_pool, and then augmented the network with TT--layers. Hope this is useful!

Bihaqo commented 5 years ago

Thanks @KhrulkovV!

It seems that the original issue is resolved so I'll close it. @MohammedAlnemari, feel free to open a new one if you have troubles getting good results on VGG.