Tixierae / deep_learning_NLP

Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP
http://arxiv.org/abs/1808.09772
435 stars 104 forks source link

Getting Attention coefficients with saved models #2

Closed fcivardi closed 5 years ago

fcivardi commented 5 years ago

Dear Antoine,

I want to thank you for your great NLP github repository, which is always font of inspiration. I’m working on text classification, and I used in the past the 1D CNN, and now the HAN, that you so clearly explain in your notebooks.

I’ve a question about getting the attention coefficients, to show “important” words and sentences. I’ve been able to get and show them, putting the code in the same script that does the training (like you did in your notebook, where you show them in the same notebook that creates and fits the model).

But I wanted to write a class “Predictor”, that loads the saved model, does the prediction and shows the attention. For the attention coefficients I need:

get_sent_att_coeffs = Model(sent_ints,sent_att_coeffs) # coeffs over the words in a sentence get_doc_attention_coeffs = Model(doc_ints,doc_att_coeffs) # coeffs over the sentences in a document

and Python of course complains that sent_ints, sent_att_coeff etc are not defined, if I don’t put in the class the whole definition of the two models (sentence encoder and document encoder). But I didn’t like to rewrite the whole definition there (which worked, but is a quick and dirty solution, I want instead to load the models from files.

I tried then this:

    json_file = open('sentencoder_newsgroups_model.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    sent_encoder = model_from_json(loaded_model_json, custom_objects= 
                                                     {'AttentionWithContext': AttentionWithContext})
    sent_encoder.load_weights('sentencoder_newsgroups_weights.h5')

    json_file = open('han_newsgroups_model.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    han = model_from_json(loaded_model_json, custom_objects={'AttentionWithContext': 
                                               AttentionWithContext})
    han.load_weights('han_newsgroups_weights.h5')

    reshaped_sentences = self._get_sequence()
    reshaped_sentences_tensor = _to_tensor(reshaped_sentences,dtype='float32') 

    get_sent_att_coeffs = Model(sent_encoder.input, 
                                                 sent_encoder.get_layer('attention_with_context_1').output[1])

    get_doc_attention_coeffs = Model(han.input, 
                                                             han.get_layer('attention_with_context_2').output[1]) 

but at get_sent_att_coeffs I get an error:

ValueError: Output tensors to a Model must be the output of a Keras Layer (thus holding past layer metadata). Found: Tensor("strided_slice_1:0", shape=(200,), dtype=float32)

Then I found that, if I print(sent_encoder.summary()) and print(han.summary()) after the model definition, I see that the attention layers have (correctly) 2 outputs, while if I print the same after loading the models from file, the second output, which gives me the coefficient, is gone (btw, I think there is an error in compute_output_shape in the Class AttentionWithContext, the dimension of the coefficients are wrong, they should be:

def compute_output_shape(self, input_shape):
    if self.return_coefficients:
        return [(input_shape[0], input_shape[-1]), (input_shape[0], input_shape[1], 1)]

instead of:

def compute_output_shape(self, input_shape):
    if self.return_coefficients:
        return [(input_shape[0], input_shape[-1]), (input_shape[0], input_shape[-1], 1)]

i.e. the second dimension of the coefficients should be the number of steps, input_shape[1] (number of words or of sentences, there is one coefficient per word or sentence) ), and not input_shape[-1] (number of features)

Anyway, I'm still looking how to print attentions for a document, if the model is just loaded from a file (either as a full model, or json + weights), as it should be in a "production" environment. If the second output of a layer won't be saved in the model, maybe we need to call the Attention twice, with 2 different parameters (return_coefficient= False and return_coefficient=True), modifying the AttentionWithContext in order to have two different outputs depending on return_coefficients.

Thank you Francesco

fcivardi commented 5 years ago

... I think calling twice the Attention cannot work, the weighs of the second one wouldn't be learned. I think the problem is, keras does not expect layers with more than one output, and thus are probably not saved... I have probably to accept, that I have to repeat the model definition in the prediction class, and cannot load it from json.

Tixierae commented 5 years ago

Dear Francesco this is a good question. When exposing the model in production (web app), I was also experiencing issues like you when reloading the saved model from disk (weights + architecture, or just architecture). So, I had to re-define the model architecture on-the-fly (you can have the model definition in a separate script and just call that function in the main script) and then use the .load_weights method. Since this option worked for me and was very fast, I didn't spend time trying to make more elegant solutions work.

Sorry for not being able to help much :)

Antoine

fcivardi commented 5 years ago

Dear Antoine,

thank you for your quick answer. Indeed, keeping the model definition in a separate script is a good solution, and it isn't worthy to spend more time finding a different one :).

Best Francesco