aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.1k stars 6.77k forks source link

Getting 'AssertionError: Could not compute output Tensor("dense/truediv:0", shape=(?, ?, 114), dtype=float32)' error with custom keras model #378

Closed Harathi123 closed 6 years ago

Harathi123 commented 6 years ago

Hi,

I am trying to deploy custom Keras model on Sagemaker. I am following the https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_keras_cifar10 example to implement that.

I am getting 'AssertionError: Could not compute output Tensor("dense/truediv:0", shape=(?, ?, 114), dtype=float32)' error while training. The full trace back is as follows:

Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start fw.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 164, in train train_wrapper.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 69, in train estimator = self._build_estimator(run_config=run_config) File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 95, in _build_estimator model = self.customer_script.keras_model_fn(hyperparameters) File "/opt/ml/code/spell_tensorflow.py", line 265, in keras_model_fn model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr), loss='categorical_crossentropy', metrics=['categorical_accuracy']) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/engine/training.py", line 682, in compile masks = self.compute_mask(self.inputs, mask=None) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/engine/topology.py", line 792, in computemask , output_masks = self._run_internal_graph(inputs, masks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/network.py", line 939, in _run_internal_graph assert str(id(x)) in tensor_map, 'Could not compute output ' + str(x) AssertionError: Could not compute output Tensor("dense/truediv:0", shape=(?, ?, 114), dtype=float32)

I am implementing LSTM seq to seq model. For that, I have to inputs to the model 'encoder inputs', decoder inputs'.

Any suggestions will be helpful...

Thanks, Harathi

laurenyu commented 6 years ago

hi @Harathi123, can you provide your code and data so that we can reproduce the error on our end?

Harathi123 commented 6 years ago

Hi @laurenyu,

Can I share the code personally?

Thanks, Harathi

Harathi123 commented 6 years ago

Hi @laurenyu,

I am able to train the model on my machine. It seems the issue is with the versions i am using and versions on sagemaker. I am using tensorflow 1.9 version and keras 2.2.2. But when i tried to give framework_version='1.9' in tensorflow estimator, I am getting the following error.

ValueError: Error training sagemaker-tensorflow-2018-08-22-16-38-13-997: Failed Reason: ClientError: Cannot pull algorithm container. Either the image does not exist or its permissions are incorrect.

Does tensorflow sagemaker support 1.9 version?

Thanks, Harathi

laurenyu commented 6 years ago

hi @Harathi123, there's not a good way to share the code privately, but as long as you can provide something similar (e.g. with fake data, etc.) that would allow someone else to run it and get the same error, that would be sufficient.

Unfortunately, we do not currently support TensorFlow 1.9 in SageMaker. We do expect to in the future, as we started with 1.4 and have launched a new container for all versions of TensorFlow since.

Can you use TensorFlow 1.8 in the meantime? If so, then don't specify framework_version arg when constructing an estimator.

If you need TensorFlow 1.9, then you'll want to build your own image (or just wait). You can find an example explaining how to create your own TensorFlow container here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.ipynb

Harathi123 commented 6 years ago

Ok @laurenyu,

Please find the sample training files below:

train_targets.txt train_inputs.txt

You can use the same for testing as well

Plesae find the code below:

 from __future__ import print_function

 import tensorflow as tf
 #from tensorflow.python.keras._impl.keras.backend.tensorflow_backend import set_session
 from tensorflow.python.keras.layers import Input, LSTM, Dense, Bidirectional, Concatenate, GRU
 from tensorflow.python.keras.models import Model
 from tensorflow.python.keras import optimizers
 #from tensorflow.python.keras.callbacks import ModelCheckpoint, TensorBoard, LearningRateScheduler
 from tensorflow.python.saved_model.signature_constants import PREDICT_INPUTS
 import numpy as np
 import os
 import logging
 import json

 logging.basicConfig(level=logging.DEBUG)

 batch_size = 64  # Batch size for training.
 epochs = 2  
 lr = 0.01

 SAGEMAKER_DATA_PATH = '/opt/ml/input/data/training'

 #allows script to be executed inside and outside the container
 base_dir = SAGEMAKER_DATA_PATH if os.path.exists(SAGEMAKER_DATA_PATH) else 'data'

 vocab_to_int = {'\t': 0, 'C': 1, 'l': 2, 'a': 3, 'i': 4, 'm': 5, ' ': 6, 'T': 7, 'y': 8, 'p': 9, 'e': 10, ':': 11, 'V': 12, 'B': 13, 'A': 14, 'c': 15, 'd': 16, 'n': 17, 't': 18, '-': 19, 'I': 20, 'j': 21, 'u': 22, 'r': 23, '\n': 24, 'P': 25, 'o': 26, 'h': 27, '/': 28, 'O': 29, 'w': 30, 'f': 31, 'F': 32, 's': 33, 'N': 34, 'M': 35, 'L': 36, 'S': 37, 'b': 38, 'D': 39, 'G': 40, 'g': 41, '1': 42, 'v': 43, 'E': 44, 'R': 45, 'Y': 46, '.': 47, 'U': 48, 'K': 49, 'W': 50, 'H': 51, '2': 52, '0': 53, '6': 54, 'q': 55, '3': 56, 'k': 57, '?': 58, '8': 59, 'x': 60, 'z': 61, '(': 62, ')': 63, '’': 64, '4': 65, '#': 66, 'J': 67, ',': 68, '7': 69, 'Z': 70, '9': 71, '&': 72, '5': 73, ';': 74, '+': 75, '*': 76, 'Q': 77, 'X': 78, '$': 79, '@': 80, '|': 81}

 int_to_vocab = {value:key for key, value in vocab_to_int.items()}
 max_sent_len = 50
 min_sent_len = 4

 input_characters = list(vocab_to_int.keys())
 num_encoder_tokens = 114
 num_decoder_tokens = 114
 max_encoder_seq_length = 49
 max_decoder_seq_length = 49

 latent_dim = 256 #Latent dimensionality of the encoding space.

 def vectorize_data(input_texts, target_texts, max_encoder_seq_length, num_encoder_tokens, vocab_to_int):
    '''Prepares the input text and targets into the proper seq2seq numpy arrays'''
    encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens),
    dtype='float32')
    decoder_input_data = np.zeros(
        (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
        dtype='float32')
    decoder_target_data = np.zeros(
        (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
        dtype='float32')

    for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
        for t, char in enumerate(input_text):
            # c0..cn
            encoder_input_data[i, t, vocab_to_int[char]] = 1.
        for t, char in enumerate(target_text):
            # c0'..cm'
            # decoder_target_data is ahead of decoder_input_data by one timestep
            decoder_input_data[i, t, vocab_to_int[char]] = 1.
            if t > 0:
                # decoder_target_data will be ahead by one timestep
                # and will not include the start character.
                decoder_target_data[i, t - 1, vocab_to_int[char]] = 1.

    return encoder_input_data, decoder_input_data, decoder_target_data

 def decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, int_to_vocab):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, vocab_to_int['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = int_to_vocab[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

 def build_model(num_decoder_tokens, num_encoder_tokens, latent_dim):
    # Define an input sequence and process it.
    encoder_inputs = Input(shape=(None, num_encoder_tokens))
    encoder = Bidirectional(LSTM(latent_dim, return_state=True)) # Bi LSTM
    encoder_outputs, state_f_h, state_f_c, state_b_h, state_b_c = encoder(encoder_inputs)# Bi LSTM
    state_h = Concatenate()([state_f_h, state_b_h])# Bi LSTM
    state_c = Concatenate()([state_f_c, state_b_c])# Bi LSTM

    # We discard `encoder_outputs` and only keep the states.
    encoder_states = [state_h, state_c]# Bi GRU, LSTM, BHi LSTM

    decoder_inputs = Input(shape=(None, num_decoder_tokens))
    # We set up our decoder to return full output sequences,
    # and to return internal states as well. We don't use the
    # return states in the training model, but we will use them in inference.
    decoder_lstm = LSTM(latent_dim*2, return_sequences=True, return_state=True)# Bi LSTM

    decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)

    decoder_dense = Dense(num_decoder_tokens, activation='softmax')
    decoder_outputs = decoder_dense(decoder_outputs)

    # Define the model that will turn
    # `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
    print('encoder-decoder  model:')
    #print(model.summary()) 

    encoder_model = Model(encoder_inputs, encoder_states)

    decoder_state_input_h = Input(shape=(latent_dim*2,))# Bi LSTM
    decoder_state_input_c = Input(shape=(latent_dim*2,)) # Bi LSTM

    decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

    decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)

    decoder_states = [state_h, state_c]
    decoder_outputs = decoder_dense(decoder_outputs)
    decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

    return model, encoder_model, decoder_model

 def keras_model_fn(hyperparameters):
    logging.info('Training model.')
    model, encoder_model, decoder_model = build_model(num_decoder_tokens, num_encoder_tokens, latent_dim)
    model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
    return model

 def serving_input_fn(hyperpameters):
    tensor = tf.placeholder(tf.float32, shape=[None, max_sent_len])
    inputs = {PREDICT_INPUTS: tensor}
    return tf.estimator.export.ServingInputReceiver(inputs, inputs)

 def train_input_fn(training_dir, hyperpameters):

    logging.info("----------------------------------------train-------------------------------------------")

    return generate_input_fn(training_dir, 'test_inputs.txt', 'test_targets.txt')

 def eval_input_fn(training_dir, hyperpameters):

    logging.info("----------------------------------------test-------------------------------------------")

    return generate_input_fn(training_dir, 'train_inputs.txt', 'train_targets.txt')

 def _generate_input_fn(training_dir, input_filename, target_filename):
    logging.info('generator function')
    with open (os.path.join(training_dir, input_filename), 'r') as f:
        i_t = f.read()

    with open (os.path.join(training_dir, target_filename), 'r') as f:
        t_t = f.read()

    input_texts = i_t.split('\n')
    target_texts = i_t.split('\n')
    encoder_input_data, decoder_input_data, decoder_target_data = vectorize_data(input_texts=input_texts,
                                                                             target_texts=target_texts, 
                                                                             max_encoder_seq_length=max_encoder_seq_length, 
                                                                             num_encoder_tokens=num_encoder_tokens, 
                                                                             vocab_to_int=vocab_to_int)

    return {PREDICT_INPUTS: [encoder_input_data, decoder_input_data]}, decoder_target_data
Harathi123 commented 6 years ago

Hi @laurenyu , any update regarding the error...

Thanks, Harathi

laurenyu commented 6 years ago

hi @Harathi123, sorry for the delay. can you also include the code you're using to invoke SageMaker?

Harathi123 commented 6 years ago

Hi @laurenyu , no problem! This is the code I am using to invoke sagemaker

import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()

role = get_execution_role()

inputs = sagemaker_session.upload_data(path='data', key_prefix='data/spell_tensor')

from sagemaker.tensorflow import TensorFlow

spell_estimator = TensorFlow(entry_point='spell_tensorflow.py',
                           role=role,
                           training_steps= 100,                                  
                           evaluation_steps= 100,
                           hyperparameters={'learning_rate': 0.01},
                           train_instance_count=1,
                           train_instance_type='ml.c4.xlarge')

 spell_estimator.fit(inputs)

` Thanks, Harathi

Harathi123 commented 6 years ago

Hi @laurenyu , any update on the issue...

Thanks, Harathi

laurenyu commented 6 years ago

hi @Harathi123, sorry for the delayed response. I was able to reproduce your issue, but didn't yet find any noticeable causes.

I did notice that in your script, both train_input_fn and eval_input_fn call generate_input_fn instead of _generate_input_fn - was that intentional?

Also, are you able to run the Keras code locally without SageMaker?

Harathi123 commented 6 years ago

Hi @laurenyu ,

Sorry, i forgot to update. I am able to train and deploy model with custom container created by following the sreps in the blog https://medium.com/@richardchen_81235/custom-keras-model-in-sagemaker-277a2831ac67

And I am able to run the code locally without sagemaker as well.

And regarding _generate_input_fn, its not intentional. It is a typo error. But it is not the reason for the issue as the issue is raising in keras_model_fn while loading model itself.

Thanks, Harathi