Closed Harathi123 closed 6 years ago
hi @Harathi123, can you provide your code and data so that we can reproduce the error on our end?
Hi @laurenyu,
Can I share the code personally?
Thanks, Harathi
Hi @laurenyu,
I am able to train the model on my machine. It seems the issue is with the versions i am using and versions on sagemaker. I am using tensorflow 1.9 version and keras 2.2.2. But when i tried to give framework_version='1.9' in tensorflow estimator, I am getting the following error.
ValueError: Error training sagemaker-tensorflow-2018-08-22-16-38-13-997: Failed Reason: ClientError: Cannot pull algorithm container. Either the image does not exist or its permissions are incorrect.
Does tensorflow sagemaker support 1.9 version?
Thanks, Harathi
hi @Harathi123, there's not a good way to share the code privately, but as long as you can provide something similar (e.g. with fake data, etc.) that would allow someone else to run it and get the same error, that would be sufficient.
Unfortunately, we do not currently support TensorFlow 1.9 in SageMaker. We do expect to in the future, as we started with 1.4 and have launched a new container for all versions of TensorFlow since.
Can you use TensorFlow 1.8 in the meantime? If so, then don't specify framework_version
arg when constructing an estimator.
If you need TensorFlow 1.9, then you'll want to build your own image (or just wait). You can find an example explaining how to create your own TensorFlow container here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.ipynb
Ok @laurenyu,
Please find the sample training files below:
train_targets.txt train_inputs.txt
You can use the same for testing as well
Plesae find the code below:
from __future__ import print_function
import tensorflow as tf
#from tensorflow.python.keras._impl.keras.backend.tensorflow_backend import set_session
from tensorflow.python.keras.layers import Input, LSTM, Dense, Bidirectional, Concatenate, GRU
from tensorflow.python.keras.models import Model
from tensorflow.python.keras import optimizers
#from tensorflow.python.keras.callbacks import ModelCheckpoint, TensorBoard, LearningRateScheduler
from tensorflow.python.saved_model.signature_constants import PREDICT_INPUTS
import numpy as np
import os
import logging
import json
logging.basicConfig(level=logging.DEBUG)
batch_size = 64 # Batch size for training.
epochs = 2
lr = 0.01
SAGEMAKER_DATA_PATH = '/opt/ml/input/data/training'
#allows script to be executed inside and outside the container
base_dir = SAGEMAKER_DATA_PATH if os.path.exists(SAGEMAKER_DATA_PATH) else 'data'
vocab_to_int = {'\t': 0, 'C': 1, 'l': 2, 'a': 3, 'i': 4, 'm': 5, ' ': 6, 'T': 7, 'y': 8, 'p': 9, 'e': 10, ':': 11, 'V': 12, 'B': 13, 'A': 14, 'c': 15, 'd': 16, 'n': 17, 't': 18, '-': 19, 'I': 20, 'j': 21, 'u': 22, 'r': 23, '\n': 24, 'P': 25, 'o': 26, 'h': 27, '/': 28, 'O': 29, 'w': 30, 'f': 31, 'F': 32, 's': 33, 'N': 34, 'M': 35, 'L': 36, 'S': 37, 'b': 38, 'D': 39, 'G': 40, 'g': 41, '1': 42, 'v': 43, 'E': 44, 'R': 45, 'Y': 46, '.': 47, 'U': 48, 'K': 49, 'W': 50, 'H': 51, '2': 52, '0': 53, '6': 54, 'q': 55, '3': 56, 'k': 57, '?': 58, '8': 59, 'x': 60, 'z': 61, '(': 62, ')': 63, '’': 64, '4': 65, '#': 66, 'J': 67, ',': 68, '7': 69, 'Z': 70, '9': 71, '&': 72, '5': 73, ';': 74, '+': 75, '*': 76, 'Q': 77, 'X': 78, '$': 79, '@': 80, '|': 81}
int_to_vocab = {value:key for key, value in vocab_to_int.items()}
max_sent_len = 50
min_sent_len = 4
input_characters = list(vocab_to_int.keys())
num_encoder_tokens = 114
num_decoder_tokens = 114
max_encoder_seq_length = 49
max_decoder_seq_length = 49
latent_dim = 256 #Latent dimensionality of the encoding space.
def vectorize_data(input_texts, target_texts, max_encoder_seq_length, num_encoder_tokens, vocab_to_int):
'''Prepares the input text and targets into the proper seq2seq numpy arrays'''
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype='float32')
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype='float32')
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
for t, char in enumerate(input_text):
# c0..cn
encoder_input_data[i, t, vocab_to_int[char]] = 1.
for t, char in enumerate(target_text):
# c0'..cm'
# decoder_target_data is ahead of decoder_input_data by one timestep
decoder_input_data[i, t, vocab_to_int[char]] = 1.
if t > 0:
# decoder_target_data will be ahead by one timestep
# and will not include the start character.
decoder_target_data[i, t - 1, vocab_to_int[char]] = 1.
return encoder_input_data, decoder_input_data, decoder_target_data
def decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, int_to_vocab):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, vocab_to_int['\t']] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ''
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = int_to_vocab[sampled_token_index]
decoded_sentence += sampled_char
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == '\n' or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
def build_model(num_decoder_tokens, num_encoder_tokens, latent_dim):
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = Bidirectional(LSTM(latent_dim, return_state=True)) # Bi LSTM
encoder_outputs, state_f_h, state_f_c, state_b_h, state_b_c = encoder(encoder_inputs)# Bi LSTM
state_h = Concatenate()([state_f_h, state_b_h])# Bi LSTM
state_c = Concatenate()([state_f_c, state_b_c])# Bi LSTM
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]# Bi GRU, LSTM, BHi LSTM
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim*2, return_sequences=True, return_state=True)# Bi LSTM
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
print('encoder-decoder model:')
#print(model.summary())
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim*2,))# Bi LSTM
decoder_state_input_c = Input(shape=(latent_dim*2,)) # Bi LSTM
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
return model, encoder_model, decoder_model
def keras_model_fn(hyperparameters):
logging.info('Training model.')
model, encoder_model, decoder_model = build_model(num_decoder_tokens, num_encoder_tokens, latent_dim)
model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
return model
def serving_input_fn(hyperpameters):
tensor = tf.placeholder(tf.float32, shape=[None, max_sent_len])
inputs = {PREDICT_INPUTS: tensor}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
def train_input_fn(training_dir, hyperpameters):
logging.info("----------------------------------------train-------------------------------------------")
return generate_input_fn(training_dir, 'test_inputs.txt', 'test_targets.txt')
def eval_input_fn(training_dir, hyperpameters):
logging.info("----------------------------------------test-------------------------------------------")
return generate_input_fn(training_dir, 'train_inputs.txt', 'train_targets.txt')
def _generate_input_fn(training_dir, input_filename, target_filename):
logging.info('generator function')
with open (os.path.join(training_dir, input_filename), 'r') as f:
i_t = f.read()
with open (os.path.join(training_dir, target_filename), 'r') as f:
t_t = f.read()
input_texts = i_t.split('\n')
target_texts = i_t.split('\n')
encoder_input_data, decoder_input_data, decoder_target_data = vectorize_data(input_texts=input_texts,
target_texts=target_texts,
max_encoder_seq_length=max_encoder_seq_length,
num_encoder_tokens=num_encoder_tokens,
vocab_to_int=vocab_to_int)
return {PREDICT_INPUTS: [encoder_input_data, decoder_input_data]}, decoder_target_data
Hi @laurenyu , any update regarding the error...
Thanks, Harathi
hi @Harathi123, sorry for the delay. can you also include the code you're using to invoke SageMaker?
Hi @laurenyu , no problem! This is the code I am using to invoke sagemaker
import os
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()
role = get_execution_role()
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/spell_tensor')
from sagemaker.tensorflow import TensorFlow
spell_estimator = TensorFlow(entry_point='spell_tensorflow.py',
role=role,
training_steps= 100,
evaluation_steps= 100,
hyperparameters={'learning_rate': 0.01},
train_instance_count=1,
train_instance_type='ml.c4.xlarge')
spell_estimator.fit(inputs)
` Thanks, Harathi
Hi @laurenyu , any update on the issue...
Thanks, Harathi
hi @Harathi123, sorry for the delayed response. I was able to reproduce your issue, but didn't yet find any noticeable causes.
I did notice that in your script, both train_input_fn
and eval_input_fn
call generate_input_fn
instead of _generate_input_fn
- was that intentional?
Also, are you able to run the Keras code locally without SageMaker?
Hi @laurenyu ,
Sorry, i forgot to update. I am able to train and deploy model with custom container created by following the sreps in the blog https://medium.com/@richardchen_81235/custom-keras-model-in-sagemaker-277a2831ac67
And I am able to run the code locally without sagemaker as well.
And regarding _generate_input_fn, its not intentional. It is a typo error. But it is not the reason for the issue as the issue is raising in keras_model_fn while loading model itself.
Thanks, Harathi
Hi,
I am trying to deploy custom Keras model on Sagemaker. I am following the https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_keras_cifar10 example to implement that.
I am getting 'AssertionError: Could not compute output Tensor("dense/truediv:0", shape=(?, ?, 114), dtype=float32)' error while training. The full trace back is as follows:
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start fw.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 164, in train train_wrapper.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 69, in train estimator = self._build_estimator(run_config=run_config) File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 95, in _build_estimator model = self.customer_script.keras_model_fn(hyperparameters) File "/opt/ml/code/spell_tensorflow.py", line 265, in keras_model_fn model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr), loss='categorical_crossentropy', metrics=['categorical_accuracy']) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/engine/training.py", line 682, in compile masks = self.compute_mask(self.inputs, mask=None) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/engine/topology.py", line 792, in computemask , output_masks = self._run_internal_graph(inputs, masks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/network.py", line 939, in _run_internal_graph assert str(id(x)) in tensor_map, 'Could not compute output ' + str(x) AssertionError: Could not compute output Tensor("dense/truediv:0", shape=(?, ?, 114), dtype=float32)
I am implementing LSTM seq to seq model. For that, I have to inputs to the model 'encoder inputs', decoder inputs'.
Any suggestions will be helpful...
Thanks, Harathi