Closed hiepph closed 6 years ago
@hiepph you may write your own version of char_dict.txt in data/char_dict folder and use the scripts in tools to establish a new map json file. The json file char_dict.json is used to map origin character to its unicode coding, the rest json files are used to map the unicode coding to the label index which is used to compute the ctc loss
What exactly the format of char_dict.txt
? Is it the same with char_dict.json
?
And besides, what scripts in tools
to generate JSON files from char_dict.txt
. I don't see it anywhere on your README or your code..
@hiepph The char_dict.txt is the file contains the character you want to recognize. You are supposed to write each character in a single line in the char_dict.txt and then use the tools/establish_char_dict.py to generate the json file.
Sorry, tools/establish_char_dict.py
doesn't exist.
If you mean local_utils/establish_char_dict.py
, as I see this is just a module, not a script for taking a file input and output JSON files.
@hiepph Sorry I put the scripts in chinese_version_debug branch you may see it when you check out the branch ==!
In branch chinese_version_debug
: Everything was okay with char_dict.txt
, I ran establish_char_dict.py
and had char_dict.json
stuff. I provided sample.txt
and ran write_text_tfrecords.py
and had train data.
But when I ran:
python tools/train_shadownet_subnet.py --dataset_dir path/to/my/tfrecords
It showed this error:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 498, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 498, in <listcomp>
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 66, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f68cb31a630>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_shadownet_subnet.py", line 157, in <module>
train_shadownet(args.dataset_dir, args.weights_path)
File "train_shadownet_subnet.py", line 65, in train_shadownet
tf.argmax(input_labels, 1))
File "/usr/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 211, in argmax
return gen_math_ops.arg_max(input, axis, name=name, output_type=output_type)
File "/usr/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 472, in arg_max
name=name)
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
raise err
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1013, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 233, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 212, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 502, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("shuffle_batch/control_dependency_1:0", shape=(?, 2), dtype=int64), values=Tensor("shuffle_batch/control_dependency_2:0", shape=(?,), dtype=int32), dense_shape=Tensor("shuffle_batch/control_dependency_3:0", shape=(2,), dtype=int64)). Consider casting elements to a TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("shuffle_batch/control_dependency_1:0", shape=(?, 2), dtype=int64), values=Tensor("shuffle_batch/control_dependency_2:0", shape=(?,), dtype=int32), dense_shape=Tensor("shuffle_batch/control_dependency_3:0", shape=(2,), dtype=int64)). Consider casting elements to a supported type.supported type.
As I see, it had error on this line on tools/train_shadownet_subnet.py
:
File "train_shadownet_subnet.py", line 65, in train_shadownet
tf.argmax(input_labels, 1))
with following log:
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("shuffle_batch/control_dependency_1:0", shape=(?, 2), dtype=int64), values=Tensor("shuffle_batch/control_dependency_2:0", shape=(?,), dtype=int32), dense_shape=Tensor("shuffle_batch/control_dependency_3:0", shape=(2,), dtype=int64)). Consider casting elements to a supported type.
What went wrong? How can I fix this?
@hiepph You should use train_shadownet.py to train the crnn model. I am so sorry that i have not got to much time to clear up the repo, the train_shadow_subnet.py is used for another program which you can ignore. Once I have enough time i will clear up the messy repo and pull again.
After generating char_dict.json
, I check out again master
branch keeping char_dict.json
and move index_2_ord_map.json
to ord_map.json
, follow your README and everything went out OK.
Thank you for your supporting, hope you will release the new version soon. I'll close the issue for now.
@hiepph Once I have time I will clear this up. You are welcomed to keep me informed of any mistake.
Hi,i can't find chinese_version_debug branch ,and local_utils/establish_char_dict.py can't convert to json files,can you show me the content of establish_char_dict.py?or send me this file? thank you very much.
Hi, I want to recognize more characters more than just English alphabet (numbers, special Unicode utf-8 characters, etc.). Suppose I have enough dataset, what is the modification I need to train/predict (full pipeline) more of these characters.
Some solutions I found on this repo:
README.md
: Supply my supervised labelsdata/char_dict/char_dict.json
Is this enough? What about
data/char_dict/ord_map.json