athena-team / athena

an open-source implementation of sequence-to-sequence based speech processing engine
https://athena-team.readthedocs.io
Apache License 2.0
955 stars 197 forks source link

cannot find -ltensorflow_framework when deploy models. #297

Closed fming closed 4 years ago

fming commented 4 years ago

Follow the instruction here: https://github.com/athena-team/athena/blob/master/deploy/README.md

Step 4. Compiling the C++ Codes and Running the executable file, run make command, it throw exception: Scanning dependencies of target tensor_utils [ 12%] Building CXX object CMakeFiles/tensor_utils.dir/src/tensor_utils.cpp.o [ 25%] Linking CXX static library libtensor_utils.a [ 25%] Built target tensor_utils Scanning dependencies of target utils [ 37%] Building CXX object CMakeFiles/utils.dir/src/utils.cpp.o [ 50%] Linking CXX static library libutils.a [ 50%] Built target utils Scanning dependencies of target tts [ 62%] Building CXX object CMakeFiles/tts.dir/src/tts.cpp.o [ 75%] Linking CXX executable tts /usr/bin/ld: cannot find -ltensorflow_framework collect2: error: ld returned 1 exit status

neneluo commented 4 years ago

please check whether libtensorflow_framework.so is in the directory bazel-bin/tensorflow

fming commented 4 years ago

Thanks, I've checked it and add the follow soft link: ln -s libtensorflow_framework.so.2.0.2 libtensorflow_framework.so ln -s libtensorflow_framework.so.2.0.2 libtensorflow_framework.so.2 then I can build the asr, but after running it raise this error: Start argmax decoding ... 2020-09-19 22:32:57.082097: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] model_pruner failed: Internal: Could not find node with name 'transformer_encoder/transformer_encoder_layer_11/layer_normalization_23/batchnorm/add_1' 2020-09-19 22:33:24.347057: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] model_pruner failed: Internal: Could not find node with name 'transformer_encoder/transformer_encoder_layer_11/layer_normalization_23/batchnorm/add_1' Segmentation fault (core dumped)

neneluo commented 4 years ago

Did the script output any error message when you run python athena/deploy_main.py *.json?

fming commented 4 years ago

@neneluo Thanks for replying. There are two json files under the folder, which one I should use? examples/asr/timit/configs/mtl_transformer_sp_101.json examples/asr/timit/configs/mtl_transformer_sp.json

I tried both, both has similar warnings like below

` None WARNING:tensorflow:From athena/deploy_main.py:61: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From athena/deploy_main.py:61: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

INFO:absl:output_names: ['transformer_encoder/transformer_encoder_layer_8/layer_normalization_17/batchnorm/add_1', 'strided_slice_1'] WARNING:tensorflow:From athena/deploy_main.py:45: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants WARNING:tensorflow:From athena/deploy_main.py:45: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.convert_variables_to_constants WARNING:tensorflow:From /home/ming/venv_athena/lib64/python3.6/site-packages/tensorflow_core/python/framework/graph_util_impl.py:275: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph WARNING:tensorflow:From /home/ming/venv_athena/lib64/python3.6/site-packages/tensorflow_core/python/framework/graph_util_impl.py:275: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.extract_sub_graph WARNING:tensorflow:From athena/deploy_main.py:46: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.remove_training_nodes WARNING:tensorflow:From athena/deploy_main.py:46: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use tf.compat.v1.graph_util.remove_training_nodes INFO:absl:output_names: ['strided_slice_3'] WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter .......... .......... WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-0.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-0.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-0.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-1.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-1.kernel WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-1.bias WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.model.transformer.decoder.layers.2.ffn.layer_with_weights-1.bias WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details. WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details. `

neneluo commented 4 years ago

Use the one which you used for model training. I guess the bug is caused by the mismatch between output names that specified in tensor_utils.cpp and pb. Try to change the first 'output_name' in the function createOutputNameStructureEncoder of deploy/src/tensor_utils.cpp to transformer_encoder/transformer_encoder_layer_8/layer_normalization_17/batchnorm/add_1 and rebuild the asr.

fming commented 4 years ago

@neneluo Thanks, it is getting better, but there is one more warning: setf.compat.v1.graph_util.remove_training_nodes INFO:absl:output_names: ['strided_slice_3'] WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1 I refer to your way to change the second output_name to "strided_slice_3", it still show this warning.

neneluo commented 4 years ago

These warnings won't affect the result. I think the main error is the mismatch of output names, so you need to change them manually in deploy/src/tensor_utils.cpp. I mean, update line 98-99 in the file from

 output_names.emplace_back(
    "transformer_encoder/transformer_encoder_layer_11/layer_normalization_23/batchnorm/add_1");

to

output_names.emplace_back(
    "transformer_encoder/transformer_encoder_layer_8/layer_normalization_17/batchnorm/add_1");

The other lines remain unchanged.

neneluo commented 4 years ago

As you can see, the script athena/deploy_main.py output the following logs:

INFO:absl:output_names: ['transformer_encoder/transformer_encoder_layer_8/layer_normalization_17/batchnorm/add_1', 'strided_slice_1']
INFO:absl:output_names: ['strided_slice_3']

The first line describes the output_names of the encoder and the second line describes the output_names of the decoder. For now, you may always need to change the output_names that specified in deploy/src/tensor_utils.cpp manually according to these logs when you change model structures. I will update the codes to make it more flexible when I have free time. Sorry for the inconvenience.

fming commented 4 years ago

@neneluo Thanks a lot, after run .asr, I got this information: `(venv_athena) [ming@localhost build]$ ./asr

Loading model ... 2020-09-23 05:25:04.233553: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX 2020-09-23 05:25:04.268616: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3392370000 Hz 2020-09-23 05:25:04.269067: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x110cc00 executing computations on platform Host. Devices: 2020-09-23 05:25:04.269094: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version Start argmax decoding ... Argmax decoding results: Total run time of samples: 1.56261 seconds. ` does it mean I deploy successfully?

neneluo commented 4 years ago

Seems something wrong. The decode results are supposed to be printing to the screen. Have you prepared vocab.txt and feats.txt and put them under deploy/graph_asr/test_data?

fming commented 4 years ago

@neneluo Thanks, I've checked it. there are no these files on my PC. so where to get these two files? it seems vocab.txt is here: examples/asr/timit/data/vocab how to get the feats.txt?

neneluo commented 4 years ago

@neneluo Thanks, I've checked it. there are no these files on my PC. so where to get these two files? it seems vocab.txt is here: examples/asr/timit/data/vocab how to get the feats.txt?

codes

fming commented 4 years ago

@neneluo Thanks! I see your code, could I ask a stupid questions, how to generate the feats.txt, here is my script, say "create_feats.py": ` from athena.transform import AudioFeaturizer from athena.data import FeatureNormalizer

path = "/home/ming/athena/examples/asr/timit/data/wav/DEV/FADG0-SI649.WAV" audio_config = {"type":"Fbank", "filterbank_channel_count":40} cmvn_file = "examples/asr/timit/data/cmvn" audio_featurizer = AudioFeaturizer(audio_config) feature_normalizer = FeatureNormalizer(cmvn_file) feat = audio_featurizer(path) feat = feature_normalizer(feat, 'FADG0')

`

after running this script, how to create the feats.txt. just do this? python create_feats.py > feats.txt

neneluo commented 4 years ago

@neneluo Thanks! I see your code, could I ask a stupid questions, how to generate the feats.txt, here is my script, say "create_feats.py": `from athena.transform import AudioFeaturizer from athena.data import FeatureNormalizer

path = "/home/ming/athena/examples/asr/timit/data/wav/DEV/FADG0-SI649.WAV" audio_config = {"type":"Fbank", "filterbank_channel_count":40} cmvn_file = "examples/asr/timit/data/cmvn" audio_featurizer = AudioFeaturizer(audio_config) feature_normalizer = FeatureNormalizer(cmvn_file) feat = audio_featurizer(path) feat = feature_normalizer(feat, 'FADG0') `

after running this script, how to create the feats.txt. just do this? python create_feats.py > feats.txt

I use numpy:

import numpy as np
np.savetxt("feats.txt", np.squeeze(feat))
fming commented 4 years ago

@neneluo Great! Thanks!, it seems working, right? (venv_athena) [ming@localhost build]$ ./asr Loading model ... 2020-09-25 04:23:22.372337: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX 2020-09-25 04:23:22.401870: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3392555000 Hz 2020-09-25 04:23:22.402542: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x21e2c00 executing computations on platform Host. Devices: 2020-09-25 04:23:22.402625: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version Start argmax decoding ... Argmax decoding results: silixcltsehfersfermahlaeclaxvyiynixcltiyixclperclpixsihnrixscltehcltferhherrowixclkliydxershehclpsil Total run time of samples: 2.14208 seconds.

neneluo commented 4 years ago

Yes. You can check whether the result is similar to its label or the results that decoding by python to assert correctness.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue is closed. You can also re-open it if needed.