Open ellurunaresh opened 6 years ago
Hello, @ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0
Actually I couldn't update tensorflow in my system. Can I solve this problem without upgradation.
On Wed, 13 Jun 2018, 9:25 pm nurtas-m, notifications@github.com wrote:
Hello, @ellurunaresh https://github.com/ellurunaresh Please, clone the latest version of g2p-seq2seq (6.2.0a0). Also, it is required tensorflow=>1.5.0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-396990317, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFy3BDd6L_I1UFtTTFUXvWkfvW7PXPks5t8TYOgaJpZM4UmYpC .
In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?
OK sure. Thanks 😊
On Thu, 14 Jun 2018, 7:36 pm nurtas-m, notifications@github.com wrote:
In that case, can you, please, install tensorflow=1.5.0 only for your user (with "--user" flag: pip install tensorflow==1.5.0 --user) ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cmusphinx/g2p-seq2seq/issues/133#issuecomment-397308253, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuFyzbZkqGd0A5t852qOoOKtuIldLAaks5t8m3ZgaJpZM4UmYpC .
Hi, I am training the model with characters to word sequence using g2p approach. I am using large vocabulary size for this experiment. The new words have been added during test time and these entries do not exist in vocab.phoneme and I got "UNK" for unknown words.
1) How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string? 2) During training can I generate "embeddings" for all unknown words?
Please help me out how to proceed further.
Please let me know how to handle this issue?
If anybody knows the solution please share it.
Hello, @ellurunaresh
- How to handle "_UNK" during decoding. Is there any option to set the parameter so that it could take any nearest string?
And, let's say, you receive following decoded sequence with "UNK" symbols: decodes = ["g", "o", "o", "UNK", "SPACE", "a", "v", "t", "UNK", "r", "n", "o", "e", "n"]
You, should take just "SPACE" symbols positions in decoded symbols:
space_positions = [sym_pos for sym_pos, sym in enumerate(decodes) if sym == 'SPACE']
In the above example, "SPACE" symbol in decodes occurs on 4th position:
print(space_positions)
[4]
So, you should build output sequence from input sequence (not decoded sequence with "UNK" and other decoded symbols). And, just add white-space character in the positions where "SPACE" character found previously:
output_str = ""
for pos, sym in enumerate(inputs):
....if pos in space_positions:
........output_str += " "
....output_str += sym
print("Input:{}".format("".join(inputs)))
print("Output:{}".format(output_str))
- During training can I generate "embeddings" for all unknown words?
Generation and utilizing embeddings outside of tensor2tensor is problematic due to applying not only tokens but also sub-tokens for building vocabularies: https://github.com/tensorflow/tensor2tensor/issues/173
Hi all, while training the model I was getting the following error. I have followed previous blogs but I couldn't solve the issue. I could see my vocabulary is in ASCII format. I am not sure why I am getting this error. Please help me out how to solve this issue. Tensorflow version: 1.3.0
Traceback (most recent call last): File "/usr/local/bin/g2p-seq2seq", line 11, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 77, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 198, in create_train_model
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 170, in prepare_model
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 178, in init__
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1195, in model_with_buckets
softmax_loss_function=softmax_loss_function))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1110, in sequence_loss
softmax_loss_function=softmax_loss_function))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1067, in sequence_loss_by_example
crossent = softmax_loss_function(target, logit)
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/seq2seq_model.py", line 117, in sampled_loss
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1191, in sampled_softmax_loss
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 947, in _compute_sampled_logits
range_max=num_classes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/candidate_sampling_ops.py", line 134, in log_uniform_candidate_sampler
seed2=seed2, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_candidate_sampling_ops.py", line 357, in _log_uniform_candidate_sampler
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
debug_python_shape_fn, require_shape_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
"ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/LogUniformCandidateSampler' (op: 'LogUniformCandidateSampler') with input shapes: [?]"