OverLordGoldDragon / see-rnn

RNN and general weights, gradients, & activations visualization in Keras & TensorFlow
MIT License
178 stars 21 forks source link

CNN with embedding #46

Closed korosig closed 3 years ago

korosig commented 3 years ago

Hi there,

I like your GitRepo, and I have started to use it on my project. I have a problem with the Embedding layer. If the first layer is the Embedding layer, the "see-rnn -> get_gradients" broke down. Any suggestion?

I have tried to use embedding with your example, like this (I don't care about the results, only to use embedding layer):

def make_model(rnn_layer, batch_shape, units):
    ipt = Input(batch_shape=batch_shape)
    input0 = Lambda(lambda x: x[:,:,0])(ipt)
    input1 = Lambda(lambda x: x[:,:,1:])(ipt)
    embed_layer1 = Embedding(100, 30, input_length=100, mask_zero=True)(input0)
    merged = concatenate([input1,embed_layer1])

    x   = rnn_layer(units, activation='tanh', return_sequences=True)(merged)
    out = rnn_layer(units, activation='tanh', return_sequences=False)(x)
    model = Model(ipt, out)
    model.compile(Adam(4e-3), 'mse')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape).astype('int32')+10, \
           np.random.randint(1, 99, (batch_shape[0], units))

def train_model(model, iterations, batch_shape):
    x, y = make_data(batch_shape)
    for i in range(iterations):
        model.train_on_batch(x, y)
        print(end='.')  # progbar
        if i % 40 == 0:
            x, y = make_data(batch_shape)

units = 6
batch_shape = (16, 100, 2*units)

model = make_model(LSTM, batch_shape, units)
train_model(model, 300, batch_shape)

x, y  = make_data(batch_shape)
grads_all  = get_gradients(model, 1, x, y)  # return_sequences=True,  layer index 1
grads_last = get_gradients(model, 2, x, y)  # return_sequences=False, layer index 2

I got this error:

AttributeError: Tensor.name is meaningless when eager execution is enabled.

My model is the next:

 #Residual block
def ResBlock(x,filters,kernel_size,dilation_rate):
    r=Conv1D(filters,kernel_size,padding='same',dilation_rate=dilation_rate,activation='relu')(x) #first convolution
    r=Conv1D(filters,kernel_size,padding='same',dilation_rate=dilation_rate)(r) #Second convolution
    if x.shape[-1]==filters:
        shortcut=x
    else:
        shortcut=Conv1D(filters,kernel_size,padding='same')(x) #shortcut (shortcut)
    o=add([r,shortcut])
    o=Activation('relu')(o) #Activation function
    return o

def build_model_withEmbedding(num_of_event,num_of_link,num_of_resource,maxlen,numOfFeat):       
    dropout=0.2
    n_timesteps = maxlen
    input_ = Input(shape=(n_timesteps,numOfFeat))
    input0 = Lambda(lambda x: x[:,:,0])(input_)
    input1 = Lambda(lambda x: x[:,:,1])(input_)
    input2 = Lambda(lambda x: x[:,:,2:3])(input_)
    input3 = Lambda(lambda x: x[:,:,3:])(input_)

    # embedding head Event
    embed_layer1 = Embedding(num_of_event+1, 30, input_length=n_timesteps, mask_zero=True)(input0)

    # embedding head Resource display name
    embed_layer2 = Embedding(num_of_resource+1, 30, input_length=n_timesteps, mask_zero=True)(input1)

    # lstm head Success
    lstm_layer = input2

    # CountVectorizer head Success
    CountVectorizer_layer = input3

    # merged head
    merged = concatenate([embed_layer1,embed_layer2,lstm_layer,CountVectorizer_layer])

    x=ResBlock(merged,filters=32,kernel_size=3,dilation_rate=1)
    x=ResBlock(x,filters=32,kernel_size=3,dilation_rate=2)
    x=ResBlock(x,filters=16,kernel_size=3,dilation_rate=4)
    merged = Flatten()(x)
    merged = Dense(10)(merged)
    outputs =Dense(1)(merged)

    model = Model(inputs=input_, outputs=outputs)
    model.compile(loss='mae', optimizer='adam')  

    return model
model = build_model_withEmbedding(num_of_event,num_of_link,num_of_resource,maxlen,train_X.shape[2])
OverLordGoldDragon commented 3 years ago

Is help still needed? I'll try to respond within a week

OverLordGoldDragon commented 3 years ago

The problem appears to be the Lambda layer, not Embedding; lambda x: x[:, :, 0] seems non-differentiable, as per the more informative error with tf.compat.v1.disable_eager_execution():

ValueError: Variable Tensor("lambda/strided_slice:0", shape=(16, 100), dtype=float32) has `None` for gradient. 
Please make sure that all of your ops have a gradient defined (i.e. are differentiable). 

Try doing the slicing differently (e.g. tf.slice), or avoid it, or simply don't try to get_gradients on layers 1 and 2 (Lambdas); below works:

def make_model(rnn_layer, batch_shape, units):
    ipt = Input(batch_shape=batch_shape)
    emb = Embedding(100, 30, input_length=100, mask_zero=True)(ipt)

    x   = rnn_layer(units, activation='tanh', return_sequences=True)(emb)
    out = rnn_layer(units, activation='tanh', return_sequences=False)(x)
    model = Model(ipt, out)
    model.compile(Adam(4e-3), 'mse')
    return model

# unchanged

units = 6
batch_shape = (16, 100)#, 2*units)

model = make_model(LSTM, batch_shape, units)
train_model(model, 30, batch_shape)

x, y  = make_data(batch_shape)
grads_all  = get_gradients(model, 1, x, y)  # return_sequences=True,  layer index 1
grads_last = get_gradients(model, 2, x, y)  # return_sequences=False, layer index 2