Not converging when using cached ResNet50 features

wiseodd commented 8 years ago

I'm trying to use ResNet50 from keras.applications as a feature extractor for my model. I successfully accomplished that with this:

inputs = Input(shape=(224, 224, 3))
base_model = ResNet50(include_top=False, weights='imagenet', input_tensor=inputs)

for layer in base_model.layers:
    layer.trainable = False

conv_feature = Flatten()(base_model.output)

x = Dense(512, activation='relu', W_regularizer=l2(l=0.01))(conv_feature)
x = Dropout(p=0.5)(x)
cls = Dense(20, activation='softmax', name='cls')(x)

self.model = Model(input=base_model.input, output=cls)

However, to speed up the experiment, I'd like to cache the extracted features:

inputs = Input(shape=(224, 224, 3))
model = ResNet50(include_top=False, weights='imagenet', input_tensor=inputs)
generator = gen.pascal_datagen_singleobj(64, include_label=False, random=False) # my custom generator

features = model.predict_generator(generator, gen.pascal.train_set.size)

np.save('cnn_features_train.npy', features)

It should basically be the same with the first code above up to conv_feature. I then use the cached features for my FC layers:

X_train, y_train = load_train_data() # np.load('cnn_features_train.npy')

inputs = Input(shape=(X_train.shape[1:]))
conv_features = Flatten()(inputs)

x = Dense(512, activation='relu', W_regularizer=l2(l=0.01))(conv_features)
x = Dropout(p=0.5)(x)
cls = Dense(20, activation='softmax', name='cls')(x)

But then, the optimization result for those two (should be) equivalent model is different. The extracted features perform way worse to the point it won't even converge.

# Using ResNet as feature extractor in online manner
Epoch 1/20
4800/4800 [==============================] - 62s - loss: 6.5075 - acc: 0.7392       
Epoch 2/20
4800/4800 [==============================] - 58s - loss: 2.8369 - acc: 0.8435     
Epoch 3/20
4800/4800 [==============================] - 61s - loss: 1.6589 - acc: 0.8608   

# Using offline extracted features
Epoch 1/50
4956/4956 [==============================] - 0s - loss: 10.0733 - acc: 0.1354      
Epoch 2/50
4956/4956 [==============================] - 0s - loss: 8.0192 - acc: 0.1336     
Epoch 3/50
4956/4956 [==============================] - 0s - loss: 6.5425 - acc: 0.1499
.
.
.
Epoch 48/50
4956/4956 [==============================] - 0s - loss: 2.6887 - acc: 0.1461     
Epoch 49/50
4956/4956 [==============================] - 0s - loss: 2.6886 - acc: 0.1461     
Epoch 50/50
4956/4956 [==============================] - 0s - loss: 2.6887 - acc: 0.1461

On a side note, I've tried VGG16 features, and it works.

Any thoughts?

cthorey commented 7 years ago

@wiseodd I have the same issue. Did you find out why that was happening at the end ?

wiseodd commented 7 years ago

Unfortunately no. I used VGG instead.

fchollet / deep-learning-models

Not converging when using cached ResNet50 features #21