GoogleCloudPlatform / cloudml-samples

Cloud ML Engine repo. Please visit the new Vertex AI samples repo at https://github.com/GoogleCloudPlatform/vertex-ai-samples
https://cloud.google.com/ai-platform/docs/
Apache License 2.0
1.51k stars 860 forks source link

UnboundLocalError: local variable 'summaries' referenced before assignment #52

Closed MrRexZ closed 7 years ago

MrRexZ commented 7 years ago

I'm onWindows 10 64-bit, with Python 3.5 and using Tensorflow 1.10. I've set the argument to point the training files and evaluation files correspondingly to local as well according to the guidelines. I was trying to run the census sample of tensorflowcore folder as a local python program, however, I received an Unbound Local Error in the _run_eval method :

            with coord.stop_on_exception():
                eval_step = 0
                while self._eval_steps is None or eval_step < self._eval_steps:
                    summaries, final_values, _ = session.run([self._summary_op, self._final_ops_dict, self._eval_ops])
                    tf.logging.info("TESTING FORMAT: {}".format(summaries))
                    if eval_step % 100 == 0:
                        tf.logging.info("On Evaluation Step: {}".format(eval_step))
                    eval_step += 1
            # Write the summaries

            self._file_writer.add_summary(summaries, global_step=train_step)
            self._file_writer.flush()
            tf.logging.info(final_values)

For some reason, an exception is received after first execution of the session.run([self._summary_op, self._final_ops_dict, self._eval_ops]) , and the loop terminates and the next line of execution turns out to be self._file_writer.add_summary(summaries, global_step=train_step), causing the summaries variable to be not initialized. The following are the logs of the execution :

WARNING:tensorflow:Unknown arguments: []
INFO:tensorflow:Created DNN hidden units [256, 64]
WARNING:tensorflow:From C:\Users\antho\Desktop\cloudml-samples-master\census\tensorflowcore\trainer\model.py:146: string_to_index_table_from_tensor (from tensorflow.contrib.lookup.lookup_ops) is deprecated and will be removed after 2017-04-10.
Instructions for updating:
Use `index_table_from_tensor`.
WARNING:tensorflow:From C:\Users\antho\Desktop\cloudml-samples-master\census\tensorflowcore\trainer\model.py:146: string_to_index_table_from_tensor (from tensorflow.contrib.lookup.lookup_ops) is deprecated and will be removed after 2017-04-10.
Instructions for updating:
Use `index_table_from_tensor`.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-09 06:36:47.299762: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.300229: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.300658: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.301100: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.301545: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.302024: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.302391: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-09 06:36:47.302766: W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Saving checkpoints for 0 into output_folder\model.ckpt.
INFO:tensorflow:global_step/sec: 23.84
INFO:tensorflow:global_step/sec: 201.377
INFO:tensorflow:global_step/sec: 204.354
INFO:tensorflow:global_step/sec: 196.517
INFO:tensorflow:global_step/sec: 196.324
INFO:tensorflow:global_step/sec: 192.727
INFO:tensorflow:global_step/sec: 200.057
INFO:tensorflow:global_step/sec: 207.537
INFO:tensorflow:global_step/sec: 194.037
INFO:tensorflow:global_step/sec: 195.747
INFO:tensorflow:Saving checkpoints for 1000 into output_folder\model.ckpt.
INFO:tensorflow:Restoring parameters from output_folder\model.ckpt-1000
INFO:tensorflow:Starting Evaluation For Step: 1000
INFO:tensorflow:Error reported to Coordinator: <class 'TypeError'>, Fetch argument dict_values([<tf.Tensor 'accuracy/update_op:0' shape=() dtype=float32>, <tf.Tensor 'auc/update_op:0' shape=() dtype=float32>]) has invalid type <class 'dict_values'>, must be a string or Tensor. (Can not convert a dict_values into a Tensor or Operation.)
Traceback (most recent call last):
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 485, in <module>
    dispatch(**parse_args.__dict__)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 387, in dispatch
    return run('', True, *args, **kwargs)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 298, in run
    step, _ = session.run([global_step_tensor, train_op])
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\training\monitored_session.py", line 500, in __exit__
    self._close_internal(exception_type)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\training\monitored_session.py", line 532, in _close_internal
    h.end(self._coordinated_creator.tf_sess)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 122, in end
    self._run_eval()
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 151, in _run_eval
    self._file_writer.add_summary(summaries, global_step=train_step)
UnboundLocalError: local variable 'summaries' referenced before assignment

Upon removing the with coord.stop_on_exception() line, here's the error caused from the sess.run :

Traceback (most recent call last):
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 267, in __init__
    fetch, allow_tensor=True, allow_operation=True))
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\framework\ops.py", line 2414, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\framework\ops.py", line 2503, in _as_graph_element_locked
    % (type(obj).__name__, types_str))
TypeError: Can not convert a dict_values into a Tensor or Operation.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 2017.1\helpers\pydev\pydevd.py", line 1585, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files (x86)\JetBrains\PyCharm 2017.1\helpers\pydev\pydevd.py", line 1015, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files (x86)\JetBrains\PyCharm 2017.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 482, in <module>
    dispatch(**parse_args.__dict__)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 385, in dispatch
    return run('', True, *args, **kwargs)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 296, in run
    step, _ = session.run([global_step_tensor, train_op])
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\training\monitored_session.py", line 500, in __exit__
    self._close_internal(exception_type)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\training\monitored_session.py", line 532, in _close_internal
    h.end(self._coordinated_creator.tf_sess)
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 122, in end
    self._run_eval()
  File "C:/Users/antho/Desktop/cloudml-samples-master/census/tensorflowcore/trainer/task.py", line 144, in _run_eval
    summaries, final_values, _ = session.run([self._summary_op, self._final_ops_dict, self._eval_ops])
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
    run_metadata_ptr)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 969, in _run
    fetch_handler = _FetchHandler(self._graph, fetches, feed_dict_string)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 408, in __init__
    self._fetch_mapper = _FetchMapper.for_fetch(fetches)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 230, in for_fetch
    return _ListFetchMapper(fetch)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 337, in __init__
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 337, in <listcomp>
    self._mappers = [_FetchMapper.for_fetch(fetch) for fetch in fetches]
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 238, in for_fetch
    return _ElementFetchMapper(fetches, contraction_fn)
  File "C:\Users\antho\Anaconda3\envs\fyp\lib\site-packages\tensorflow\python\client\session.py", line 271, in __init__
    % (fetch, type(fetch), str(e)))
TypeError: Fetch argument dict_values([<tf.Tensor 'auc/update_op:0' shape=() dtype=float32>, <tf.Tensor 'accuracy/update_op:0' shape=() dtype=float32>]) has invalid type <class 'dict_values'>, must be a string or Tensor. (Can not convert a dict_values into a Tensor or Operation.)
MrRexZ commented 7 years ago

Found out the issue, it turns out to be due to the Python version incompatibility. self._eval_ops is of type dict_values instead of a list. Converting it to a list solves the problem.