MaybeShewill-CV / CRNN_Tensorflow

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition
MIT License
1.03k stars 388 forks source link

AttributeError: module 'tensorflow' has no attribute 'data' & TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [Dimension(None), 32, 100, 3]. Consider casting elements to a supported type. #295

Closed kspook closed 5 years ago

kspook commented 5 years ago

@MaybeShewill-CV, I am training Korean, and I got 4 errors. What should I do?

1.AttributeError: module 'tensorflow' has no attribute 'data' So, I changed to tf.contrib.data. (I referred to this link, https://github.com/tensorflow/models/issues/2879#issuecomment-347721584)

  File "tools/train_shadownet.py", line 157, in train_shadownet
    batch_size=CFG.TRAIN.BATCH_SIZE
  File "/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs
    num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS
  File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 378, in inputs
    dataset = tf.data.TFRecordDataset(tfrecords_path)
AttributeError: module 'tensorflow' has no attribute 'data'
  1. unexpected keyword, after I fixed no.1. I omitted 'drop_remainder' to use tf1.3, right? According to https://github.com/MaybeShewill-CV/CRNN_Tensorflow/issues/186#issuecomment-451461322, I need to use tf1.3? (in case of #166, someone recommed to upgrade tf1.10)
    
    (crnntf) kspook@MLGPU2:~/CRNN_Tensorflow$ python tools/train_shadownet.py --dataset_dir data/  --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json
    I0630 06:07:30.553470 40490 train_shadownet.py:573] Use single gpu to train the model
    Traceback (most recent call last):
    File "tools/train_shadownet.py", line 579, in <module>
    need_decode=args.decode_outputs
    File "tools/train_shadownet.py", line 157, in train_shadownet
    batch_size=CFG.TRAIN.BATCH_SIZE
    File "/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs
    num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS
    File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 381, in inputs
    dataset = dataset.batch(batch_size, drop_remainder=True)
    TypeError: batch() got an unexpected keyword argument 'drop_remainder'
3. TypeError: map() got an unexpected keyword argument 'num_parallel_calls'
   I omitted.
   (I can't get point at https://github.com/CODAIT/deep-histopath/issues/5)

(crnntf) kspook@MLGPU2:~/CRNN_Tensorflow$ python tools/train_shadownet.py --dataset_dir data/ --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json I0630 06:09:12.013188 40807 train_shadownet.py:573] Use single gpu to train the model Traceback (most recent call last): File "tools/train_shadownet.py", line 579, in need_decode=args.decode_outputs File "tools/train_shadownet.py", line 157, in train_shadownet batch_size=CFG.TRAIN.BATCH_SIZE File "/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 387, in inputs num_parallel_calls=num_threads) TypeError: map() got an unexpected keyword argument 'num_parallel_calls'


4. TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [Dimension(None), 32, 100, 3]. Consider casting elements to a supported type.

According to 
https://github.com/MaybeShewill-CV/CRNN_Tensorflow/issues/251#issuecomment-490729310, you recommed to change tf1.12. How about tensorflow-gpu?

https://github.com/MaybeShewill-CV/CRNN_Tensorflow/issues/251#issuecomment-490765929
Unlike below, I had an error, TypeError: map() got an unexpected keyword argument 'num_parallel_calls'
 after I removed drop_remainder=True.
Thus I removed 'num_parallel_call', and finally I got this error. 
what's the problem?

The problem was found, before my version of tf is wrong, when reading TFRecord data, dataset = dataset.batch( batch_size, drop_remainder=True) error, I changed to dataset = dataset.batch(batch_size), no error, but The above problem occurred. When reading tfRecord, the data could not be fetched, causing none after the reshape, resulting in failure.

Now I have run through the entire training and testing process in my local area. Thank you for sharing and patiently answering the questions. The screenshot below shows the output of the training process.

(crnntf) kspook@MLGPU2:~/CRNN_Tensorflow$ python tools/train_shadownet.py --dataset_dir data/ --char_dict_path data/char_dict/char_dict.json --ord_map_dict_path data/char_dict/ord_map.json I0630 06:10:20.127820 41011 train_shadownet.py:573] Use single gpu to train the model Traceback (most recent call last): File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 460, in make_tensor_proto str_values = [compat.as_bytes(x) for x in proto_values] File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 460, in str_values = [compat.as_bytes(x) for x in proto_values] File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes (bytes_or_text,)) TypeError: Expected binary or unicode string, got Dimension(None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/train_shadownet.py", line 579, in need_decode=args.decode_outputs File "tools/train_shadownet.py", line 157, in train_shadownet batch_size=CFG.TRAIN.BATCH_SIZE File "/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 386, in inputs dataset = dataset.map(map_func=self._extract_features_batch) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 964, in map return MapDataset(self, map_func, num_threads, output_buffer_size) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 1735, in init self._map_func.add_to_graph(ops.get_default_graph()) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 449, in add_to_graph self._create_definition_if_needed() File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/contrib/data/python/framework/function.py", line 168, in _create_definition_if_needed outputs = self._func(*inputs) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/contrib/data/python/ops/dataset_ops.py", line 1723, in tf_map_func ret = map_func(nested_args) File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 361, in _extract_features_batch images = tf.reshape(images, [bs, h, w, CFG.ARCH.INPUT_CHANNELS]) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2619, in reshape name=name) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 493, in apply_op raise err File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 490, in apply_op preferred_dtype=default_dtype) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 464, in make_tensor_proto "supported type." % (type(values), values)) TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [Dimension(None), 32, 100, 3]. Consider casting elements to a supported type.

kspook commented 5 years ago

I reinstalled Cuda=9.0 and Cudnn 7.0 again.

1) https://medium.com/@zhanwenchen/install-cuda-and-cudnn-for-tensorflow-gpu-on-ubuntu-79306e4ac04e

$ ./mnistCUDNN
cudnnGetVersion() : 7005 , CUDNN_VERSION from cudnn.h : 7601 (7.6.1)
Cuda failurer version : GCC 4.9.3
Error: no CUDA-capable device is detected
error_util.h:93
Aborting...

2) https://zcsd.github.io/post/install_cuda_cudnn_for_tf_in_ubuntu18-04/

tried cudnn due to error mesg ablove
        dataset = tf.data.TFRecordDataset(tfrecords_path)      
        dataset = dataset.batch(batch_size, drop_remainder=True)

        # The map transformation takes a function and applies it to every element
        # of the dataset.
        dataset = dataset.map(self._extract_features_batch  ,
                            num_threads)

it worked. (tensorflow-gpu=1.12, I reinstall Anaconda, too)

But, I had another error. (the same error as #296)


I0701 15:13:11.209050 23419 train_shadownet.py:573] Use single gpu to train the model
dataset.map(),  32 6 <function CrnnFeatureReader._extract_features_batch at 0x7f688846d268>
dataset.map(),  32 6 <function CrnnFeatureReader._extract_features_batch at 0x7f688846d268>
2019-07-01 15:13:14.749925: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-01 15:13:16.119361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla M60 major: 5 minor: 2 memoryClockRate(GHz): 1.1775
pciBusID: 7dc9:00:00.0
totalMemory: 7.94GiB freeMemory: 7.86GiB
2019-07-01 15:13:16.119407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-01 15:13:24.932411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-01 15:13:24.932469: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-01 15:13:24.932484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-01 15:13:24.932720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7316 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 7dc9:00:00.0, compute capability: 5.2)
I0701 15:13:25.215558 23419 train_shadownet.py:272] Training from scratch
Traceback (most recent call last):
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[{{node val_IteratorGetNext}} = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_1)]]
     [[{{node CTCLoss_1/_73}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_970_CTCLoss_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train_shadownet.py", line 579, in <module>
    need_decode=args.decode_outputs
  File "tools/train_shadownet.py", line 325, in train_shadownet
    [optimizer, train_ctc_loss, merge_summary_op])
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[node val_IteratorGetNext (defined at /home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py:414)  = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_1)]]
     [[{{node CTCLoss_1/_73}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_970_CTCLoss_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'val_IteratorGetNext', defined at:
  File "tools/train_shadownet.py", line 579, in <module>
    need_decode=args.decode_outputs
  File "tools/train_shadownet.py", line 160, in train_shadownet
    batch_size=CFG.TRAIN.BATCH_SIZE
  File "/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs
    num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS
  File "/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 414, in inputs
    return iterator.get_next(name='{:s}_IteratorGetNext'.format(self._dataset_flag))
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 421, in get_next
    name=name)), self._output_types,
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/jet/app/anaconda3/envs/crnntf/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): End of sequence
     [[node val_IteratorGetNext (defined at /home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py:414)  = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator_1)]]
     [[{{node CTCLoss_1/_73}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_970_CTCLoss_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
MaybeShewill-CV commented 5 years ago

@kspook Everything works fine with tensorflow 1.12.0. You may upgrade your tensorflow to solve this. If the problem still exist you may need help from Tensorflow API:)

kspook commented 5 years ago

@MaybeShewill-CV, I mentioned that I used tf1.12.0.

I changed 'num_parallel_calls (no.3), too.

MaybeShewill-CV commented 5 years ago

@kspook Then maybe you need some help from tensorflow:)

kspook commented 5 years ago

@MaybeShewill-CV, is there any minimum number of value for validation and testing? I just used small amount.

MaybeShewill-CV commented 5 years ago

@kspook If you want to train the model on a small dataset you may make your own or crop a batch of data from the origin Synth90K dataset:)

kspook commented 5 years ago

@MaybeShewill-CV, do you think I have an error, because I have an error for 'nvidia-smi',

($ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch)
MaybeShewill-CV commented 5 years ago

@kspook Since it not a issue relative to this repo. I think you will get useful information on stack overflow:)

kspook commented 5 years ago

@MaybeShewill-CV, it's not nvidia-smi,


(crnntf) kspook@MLNC6:/usr/local/cuda/samples/bin/x86_64/linux/release$ nvidia-smi
Thu Jul  4 11:00:00 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000D691:00:00.0 Off |                    0 |
| N/A   47C    P0    57W / 149W |      0MiB / 11439MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(crnntf) kspook@MLNC6:/usr/local/cuda/samples/bin/x86_64/linux/release$ 

cuda9.0 installed successfully.


crnntf) kspook@MLNC6:/usr/local/cuda/samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K80"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    3.7
  Total amount of global memory:                 11440 MBytes (11995578368 bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Max Clock rate:                            824 MHz (0.82 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   54929 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

cudnn 7 installed successfully


(crnntf) kspook@MLNC6:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7601 , CUDNN_VERSION from cudnn.h : 7601 (7.6.1)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 13  Capabilities 3.7, SmClock 823.5 Mhz, MemSize (Mb) 11439, MemClock 2505.0 Mhz, Ecc=1, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.132224 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.133056 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.157568 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.244576 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.406240 time requiring 203008 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 2
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.139008 time requiring 100 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.141408 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.176672 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.252768 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.431808 time requiring 203008 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

The error still occurred.


(crnntf) kspook@MLNC6:~/CRNN_Tensorflow$ python tools/train_shadownet.py --dataset_dir ./data/ --char_dict_path ./data/char_dict/char_dict.json --ord_map_dict_path ./data/char_dict/ord_map.json 
I0704 11:05:45.903860 17823 train_shadownet.py:569] Use single gpu to train the model
2019-07-04 11:05:49.530389: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-04 11:05:54.737760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: d691:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-07-04 11:05:54.737815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-04 11:05:55.023881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-04 11:05:55.023945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-07-04 11:05:55.023963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-07-04 11:05:55.024223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10295 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: d691:00:00.0, compute capability: 3.7)
I0704 11:05:55.310481 17823 train_shadownet.py:268] Training from scratch
Traceback (most recent call last):
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[{{node train_IteratorGetNext}} = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train_shadownet.py", line 575, in <module>
    need_decode=args.decode_outputs
  File "tools/train_shadownet.py", line 321, in train_shadownet
    [optimizer, train_ctc_loss, merge_summary_op])
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
     [[node train_IteratorGetNext (defined at /data/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py:406)  = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]

Caused by op 'train_IteratorGetNext', defined at:
  File "tools/train_shadownet.py", line 575, in <module>
    need_decode=args.decode_outputs
  File "tools/train_shadownet.py", line 153, in train_shadownet
    batch_size=CFG.TRAIN.BATCH_SIZE
  File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 289, in inputs
    num_threads=CFG.TRAIN.CPU_MULTI_PROCESS_NUMS
  File "/data/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py", line 406, in inputs
    return iterator.get_next(name='{:s}_IteratorGetNext'.format(self._dataset_flag))
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 421, in get_next
    name=name)), self._output_types,
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2069, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/kspook/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): End of sequence
     [[node train_IteratorGetNext (defined at /data/home/kspook/CRNN_Tensorflow/data_provider/tf_io_pipline_fast_tools.py:406)  = IteratorGetNext[output_shapes=[[32,32,100,3], <unknown>, [32]], output_types=[DT_FLOAT, DT_VARIANT, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]