aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Failed Reason: AlgorithmError: uncaught exception during training: features should be a dictionary of `Tensor`s. Given type: <type 'function'> #153

Closed ghost closed 6 years ago

ghost commented 6 years ago

I'm not exactly sure what happened. All of the sudden all of my training tasks now fail with no code changes. There are definitely authentication issues with aws credentials even though I am training on the online Jupyter notebook and my session is active.

This is how I am constructing the classifier.


classifier = TensorFlow(entry_point='sm_transcript_classifier_ep.py',
                               role=role,
                               training_steps= 1e4,                                  
                               evaluation_steps= 100,
                               train_instance_count=1,
                               train_instance_type=INSTANCE_TYPE,
                               hyperparameters={
                                   "question": QUESTION,
                                   "n_words": _get_n_words()
                               })

model function:

def estimator_fn(run_config, params):
    bow_column = tf.feature_column.categorical_column_with_identity(
        WORDS_FEATURE, num_buckets=params["n_words"])
    bow_embedding_column = tf.feature_column.embedding_column(
        bow_column, dimension=EMBEDDING_SIZE, combiner="sqrtn")
    return tf.estimator.LinearClassifier(
        feature_columns=[bow_embedding_column],
        config=run_config
        #loss_reduction=tf.losses.Reduction.SUM_BY_NONZERO_WEIGHTS #this doesn't work even though SageMaker should support TF 1.6??
    )

Full error log:

...........................................................
2018-04-17 20:34:49,194 INFO - root - running container entrypoint
2018-04-17 20:34:49,194 INFO - root - starting train task
2018-04-17 20:34:49,199 INFO - container_support.training - Training starting
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-04-17 20:34:51,095 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2
2018-04-17 20:34:51,305 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): sagemaker-us-east-1-245511257894.s3.amazonaws.com
2018-04-17 20:34:51,983 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com
2018-04-17 20:34:52,246 INFO - tf_container - ----------------------TF_CONFIG--------------------------
2018-04-17 20:34:52,246 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}
2018-04-17 20:34:52,246 INFO - tf_container - ---------------------------------------------------------
2018-04-17 20:34:52,246 INFO - tf_container - creating RunConfig:
2018-04-17 20:34:52,246 INFO - tf_container - {'save_checkpoints_secs': 300}
2018-04-17 20:34:52,247 INFO - tensorflow - TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {u'master': [u'algo-1:2222']}, u'task': {u'index': 0, u'type': u'master'}}
2018-04-17 20:34:52,247 INFO - tf_container - invoking estimator_fn
2018-04-17 20:34:52,247 INFO - tensorflow - Using config: {'_save_checkpoints_secs': 300, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': u'master', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb3b40d4190>, '_model_dir': u's3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-17-20-30-05-729/checkpoints', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
2018-04-17 20:34:52,248 INFO - tensorflow - Skip starting Tensorflow server as there is only one node in the cluster.
2018-04-17 20:34:52.265465: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/config and using profilePrefix = 1
2018-04-17 20:34:52.267103: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/credentials and using profilePrefix = 0
2018-04-17 20:34:52.267120: I tensorflow/core/platform/s3/aws_logging.cc:54] Setting provider to read credentials from /root//.aws/credentials for credentials file and /root//.aws/config for the config file , for use with profile default
2018-04-17 20:34:52.267133: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating HttpClient with max connections2 and scheme http
2018-04-17 20:34:52.267154: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 2
2018-04-17 20:34:52.267175: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating TaskRole with default ECSCredentialsClient and refresh rate 900000
2018-04-17 20:34:52.267213: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/credentials for reading.
2018-04-17 20:34:52.267228: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-17 20:34:52.267238: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/config for reading.
2018-04-17 20:34:52.267244: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-17 20:34:52.267255: I tensorflow/core/platform/s3/aws_logging.cc:54] Credentials have expired or will expire, attempting to repull from ECS IAM Service.
2018-04-17 20:34:52.267342: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-17 20:34:52.267357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:52.271264: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 25
2018-04-17 20:34:52.275164: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-17 20:34:52.275184: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:52.337301: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-17 20:34:52.337347: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-17 20:34:52.338141: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:56,292 INFO - tensorflow - Calling model_fn.
2018-04-17 20:34:56,293 ERROR - container_support.training - uncaught exception during training: features should be a dictionary of `Tensor`s. Given type: <type 'function'>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 139, in train
    train_wrapper.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 73, in train
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate
    executor.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run
    getattr(self, task_to_run)()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master
    self._start_distributed_training(saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 715, in _start_distributed_training
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 316, in _model_fn
    config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 138, in _linear_model_fn
    'Given type: {}'.format(type(features)))
ValueError: features should be a dictionary of `Tensor`s. Given type: <type 'function'>

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-2a854a24dd88> in <module>()
     17                                })
     18 
---> 19 classifier.fit(inputs)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit(self, inputs, wait, logs, job_name, run_tensorboard_locally)
    234                 tensorboard.event.set()
    235         else:
--> 236             fit_super()
    237 
    238     @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit_super()
    219         """
    220         def fit_super():
--> 221             super(TensorFlow, self).fit(inputs, wait, logs, job_name)
    222 
    223         if run_tensorboard_locally and wait is False:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    608         self._hyperparameters[JOB_NAME_PARAM_NAME] = self._current_job_name
    609         self._hyperparameters[SAGEMAKER_REGION_PARAM_NAME] = self.sagemaker_session.boto_session.region_name
--> 610         super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
    611 
    612     def hyperparameters(self):

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    163         self.latest_training_job = _TrainingJob.start_new(self, inputs)
    164         if wait:
--> 165             self.latest_training_job.wait(logs=logs)
    166 
    167     @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
    396     def wait(self, logs=True):
    397         if logs:
--> 398             self.sagemaker_session.logs_for_job(self.job_name, wait=True)
    399         else:
    400             self.sagemaker_session.wait_for_job(self.job_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll)
    649 
    650         if wait:
--> 651             self._check_job_status(job_name, description)
    652             if dot:
    653                 print()

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc)
    393         if status != 'Completed':
    394             reason = desc.get('FailureReason', '(No reason provided)')
--> 395             raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))
    396 
    397     def wait_for_endpoint(self, endpoint, poll=5):

ValueError: Error training sagemaker-tensorflow-2018-04-17-20-30-05-729: Failed Reason: AlgorithmError: uncaught exception during training: features should be a dictionary of `Tensor`s. Given type: <type 'function'>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 139, in train
    train_wrapper.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 73, in train
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate
    executor.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run
    getattr(self, task_to_run)()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master
    self._start_distributed_training(saving_liste
ChoiByungWook commented 6 years ago

Hello,

Thanks for using SageMaker!

I am looking into this currently.

Can you please provide the following:

  1. A minimal repo.
  2. Python sdk version and Tensorflow container version.

Thanks!

ghost commented 6 years ago

OK, that pointed me in the right direction. AWS-SageMaker-Python-SDK is at version 1.2.1. Adding framework_version="1.5" to the constructor fixed my issue. It looks like it is now defaulting to 1.6 which causes the issues above. However, I've been using TF 1.6 and 1.7 on my local machine so the issue is probably SDK related. How do I go about updating the SDK version on the instance notebooks?

ChoiByungWook commented 6 years ago

Hello,

Thank you for that information.

It would be extremely helpful if you could provide a minimal repo case.

In addition, here is the source code to our TensorFlow containers. https://github.com/aws/sagemaker-tensorflow-containers

As for updating the SDK version on the instance notebook, that can be done in the notebook by running the following command in a stand alone cell: ! pip install --upgrade sagemaker Please restart the kernel and the sdk version should be updated for the corresponding notebook.

ghost commented 6 years ago

OK updating the SDK made no difference.

minimal repo: https://github.com/david-bishai/sagemaker-python-sdk_issue-31

ChoiByungWook commented 6 years ago

Hello,

Thanks for providing the repo case, I was able to reproduce it on a SageMaker notebook instance using local mode.

2018-04-24 07:02:23,101 ERROR - container_support.training - uncaught exception during training: features should be a dictionary of Tensors. Given type: <type 'function'> Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start fw.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 120, in train train_wrapper.train() File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 84, in train tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate executor.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run getattr(self, task_to_run)() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master self._start_distributed_training(saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 715, in _start_distributed_training saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 316, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 138, in _linear_model_fn 'Given type: {}'.format(type(features))) ValueError: features should be a dictionary of Tensors. Given type: <type 'function'>

I am not too sure what is causing this issue, and am still investigating, I'll post when I have an update.

ChoiByungWook commented 6 years ago

Hello @david-bishai ,

The reason for this error was because the train_input_fn and eval_input_fn in your user script, entry_point.py, should return only a tuple of features, labels and not a function itself.

The numpy_input_fn returns an input function that would feed dict of numpy arrays into the model and not a tuple of features, labels.

For our TF 1.4 & 1.5 containers, it was an undocumented feature, where we allowed customers to provide a function instead of a value of just features, labels. We apologize for the experience and have documented this.

So to make your entry_point user script work within all versions, just invoke the function before returning. This can be done by adding () at the end of the function. I was able to successfully run your minimal repo with this change.

def train_input_fn(training_dir, params):
    """Returns input function that would feed the model during training"""
    x_train, x_test, y_train, y_test = get_data(training_dir)
    vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(
    MAX_DOCUMENT_LENGTH)

    x_transform_train = vocab_processor.fit_transform(x_train)
    x_train = np.array(list(x_transform_train))

    return tf.estimator.inputs.numpy_input_fn(
        x={WORDS_FEATURE: x_train},
        y=y_train,
        batch_size=len(x_train),
        num_epochs=None,
        shuffle=True)()

def eval_input_fn(training_dir, params):
    """Returns input function that would feed the model during evaluation"""
    x_train, x_test, y_train, y_test = get_data(training_dir)
    vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(
    MAX_DOCUMENT_LENGTH)
    vocab_processor.fit(x_train)

    x_transform_test = vocab_processor.transform(x_test)
    x_test = np.array(list(x_transform_test))

    return tf.estimator.inputs.numpy_input_fn(
        x={WORDS_FEATURE: x_test}, y=y_test, num_epochs=1, shuffle=False)()

Please let me know if this works for you.

Thanks!

ghost commented 6 years ago

OK thanks, it seems to be training now! I'm not sure what this is about though:

.........................................................................
2018-04-25 20:07:24,466 INFO - root - running container entrypoint
2018-04-25 20:07:24,466 INFO - root - starting train task
2018-04-25 20:07:24,472 INFO - container_support.training - Training starting
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-04-25 20:07:26,862 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2
2018-04-25 20:07:27,160 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): sagemaker-us-east-1-245511257894.s3.amazonaws.com
2018-04-25 20:07:27,618 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com
2018-04-25 20:07:27,670 INFO - tf_container - ----------------------TF_CONFIG--------------------------
2018-04-25 20:07:27,670 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}
2018-04-25 20:07:27,670 INFO - tf_container - ---------------------------------------------------------
2018-04-25 20:07:27,670 INFO - tf_container - creating RunConfig:
2018-04-25 20:07:27,671 INFO - tf_container - {'save_checkpoints_secs': 300}
2018-04-25 20:07:27,671 INFO - tensorflow - TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {u'master': [u'algo-1:2222']}, u'task': {u'index': 0, u'type': u'master'}}
2018-04-25 20:07:27,671 INFO - tf_container - invoking estimator_fn
2018-04-25 20:07:27,671 INFO - tensorflow - Using config: {'_save_checkpoints_secs': 300, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': u'master', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f81a21ae050>, '_model_dir': u's3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
2018-04-25 20:07:27,672 INFO - tensorflow - Skip starting Tensorflow server as there is only one node in the cluster.
2018-04-25 20:07:27.681011: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/config and using profilePrefix = 1
2018-04-25 20:07:27.681592: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/credentials and using profilePrefix = 0
2018-04-25 20:07:27.681607: I tensorflow/core/platform/s3/aws_logging.cc:54] Setting provider to read credentials from /root//.aws/credentials for credentials file and /root//.aws/config for the config file , for use with profile default
2018-04-25 20:07:27.681620: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating HttpClient with max connections2 and scheme http
2018-04-25 20:07:27.681642: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 2
2018-04-25 20:07:27.681658: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating TaskRole with default ECSCredentialsClient and refresh rate 900000
2018-04-25 20:07:27.681694: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/credentials for reading.
2018-04-25 20:07:27.681711: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-25 20:07:27.681724: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/config for reading.
2018-04-25 20:07:27.681734: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-25 20:07:27.681745: I tensorflow/core/platform/s3/aws_logging.cc:54] Credentials have expired or will expire, attempting to repull from ECS IAM Service.
2018-04-25 20:07:27.681820: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-25 20:07:27.681840: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:27.685588: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 25
2018-04-25 20:07:27.687830: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-25 20:07:27.687851: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:27.745158: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:27.745196: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:27.746132: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32,507 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:32,508 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:32,768 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:32,768 INFO - tensorflow - Create CheckpointSaverHook.
2018-04-25 20:07:32.769101: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.778035: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.778067: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.778223: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.787828: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.794657: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.794688: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.794856: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.805357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.812592: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.812626: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.812783: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.904749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.912901: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007321524686852903
2018-04-25 20:07:32.952605: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.964337: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.975972: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.984361: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.994832: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.022984: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.045033: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.054378: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33,262 INFO - tensorflow - Graph was finalized.
2018-04-25 20:07:33.266518: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.274858: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:33.274892: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:33.275052: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33,379 INFO - tensorflow - Running local_init_op.
2018-04-25 20:07:33,382 INFO - tensorflow - Done running local_init_op.
2018-04-25 20:07:33.424724: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.496396: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:33.496449: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:33.496617: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.674614: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.762400: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007331524686853674
2018-04-25 20:07:33.763544: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.779001: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.886115: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34,361 INFO - tensorflow - Saving checkpoints for 1 into s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt.
2018-04-25 20:07:34.380768: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.391870: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854380
2018-04-25 20:07:34.397628: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.422301: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854391
2018-04-25 20:07:34.423217: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.439377: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.529052: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.541050: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.583608: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854540
2018-04-25 20:07:34.583811: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.599446: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.629216: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.641439: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.651125: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854641
2018-04-25 20:07:34.651328: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.660595: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.672229: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.683251: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.698314: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.708896: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.726261: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.806692: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.816075: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.923903: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854815
2018-04-25 20:07:34.924160: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.935098: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.948491: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.960220: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.979797: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854959
2018-04-25 20:07:34.980019: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.994514: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.037873: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.079588: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.086553: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:35.086585: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:35.086744: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.108749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.150794: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007351524686855108
2018-04-25 20:07:35.151074: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.165596: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.230282: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.247681: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.255765: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.266954: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.273601: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.284126: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.296354: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.313319: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.322861: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.331801: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.340025: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.

2018-04-25 20:07:39,147 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:39,147 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:39,831 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:39,853 INFO - tensorflow - Starting evaluation at 2018-04-25-20:07:39
2018-04-25 20:07:39,918 INFO - tensorflow - Graph was finalized.
2018-04-25 20:07:39,918 INFO - tensorflow - Restoring parameters from s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt-1
2018-04-25 20:07:39.973564: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:39.999326: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.008086: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.020798: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.028749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.039653: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.047401: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.057312: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.069649: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.082472: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.093253: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40,125 INFO - tensorflow - Running local_init_op.
2018-04-25 20:07:40,152 INFO - tensorflow - Done running local_init_op.
2018-04-25 20:07:40,948 INFO - tensorflow - Finished evaluation at 2018-04-25-20:07:40
2018-04-25 20:07:40,948 INFO - tensorflow - Saving dict for global step 1: accuracy = 0.7, accuracy_baseline = 0.7, auc = 0.5032116, auc_precision_recall = 0.648808, average_loss = 0.6735603, global_step = 1, label/mean = 0.3, loss = 0.6726951, prediction/mean = 0.47389776
2018-04-25 20:07:40.948570: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.960667: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:40.960699: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:40.960861: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.975202: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.985922: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:40.985959: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:40.986117: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.997728: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.007964: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:41.008000: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:41.008158: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.021124: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.031989: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007411524686861020
2018-04-25 20:07:41.178749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.193412: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.205561: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.214413: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.224619: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.273470: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.294372: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.301845: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.345282: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.373571: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.538971: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.586359: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41,608 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:41,609 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:41,708 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:41,708 INFO - tensorflow - Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
2018-04-25 20:07:41,709 INFO - tensorflow - Signatures INCLUDED in export for Regress: ['regression']
2018-04-25 20:07:41,709 INFO - tensorflow - Signatures INCLUDED in export for Predict: ['predict']
2018-04-25 20:07:41.709477: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.718035: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:41.718070: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:41.718252: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41,740 INFO - tensorflow - Restoring parameters from s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt-1
2018-04-25 20:07:41.754121: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.771398: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.814798: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.833340: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.843000: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.854231: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.864706: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.965630: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.981404: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.084626: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.113944: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.131216: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.140608: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.140641: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.140812: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.154724: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.160731: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.160767: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.160933: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.175357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.189820: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.189856: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.190042: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.202545: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.211210: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.211240: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.211394: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.225958: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.235843: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.235873: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.236044: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.250989: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.263674: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862250
2018-04-25 20:07:42.263882: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.274792: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862263
2018-04-25 20:07:42.275004: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.286674: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862274
2018-04-25 20:07:42,286 INFO - tensorflow - Assets added to graph.
2018-04-25 20:07:42,287 INFO - tensorflow - No assets to write.
2018-04-25 20:07:42.287315: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.297594: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.297632: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.297800: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.309773: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.445939: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.445976: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.446170: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.473934: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.481774: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.481839: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.482005: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.502119: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.514847: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862501
2018-04-25 20:07:42.540087: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.726431: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862539
2018-04-25 20:07:42.730663: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.797579: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862726
2018-04-25 20:07:42.797810: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.886932: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.003175: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.015325: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.037305: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863015
2018-04-25 20:07:43.037516: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.140125: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.175190: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.187236: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.197812: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863187
2018-04-25 20:07:43.198016: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.211355: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.308609: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.326101: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.359362: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.385707: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.405636: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.505584: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.518001: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.543803: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863517
2018-04-25 20:07:43.544067: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.554117: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.582661: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.622197: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.632210: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:43.632267: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:43.632441: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.653777: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.

2018-04-25 20:07:43.795255: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863653
laurenyu commented 6 years ago

That logging output is from interactions with S3 - it doesn't look like there are any errors in this particular run.

It seems like the original issue has been resolved, so I'm going to close this issue. Feel free to reopen if necessary, though.