Closed tahsintahsin closed 12 months ago
Hi @tahsintahsin,
It sounds like a Jupyter notebook setup issue prior to running a job on inf1. Could you double check the installation steps in https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22 and settings?
pip install ipykernel python3.10 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name "Python (torch-neuronx)" pip install jupyter notebook pip install environment_kernels
Meanwhile, you can also run the python scripts directly without invoking a notebook: create a python script say test.py and run it with python3 test.py.
Hi @jyang-aws I was able to access to the url by running the Jupyter notebook with --allow-root. However, I am now stuck further in the process I am trying to convert a rasa model (tf based) into inf1 compiled version, for which I need to install rasa python package first but I am seeing a lot of package version incompatibilities. Tried to solve some of them but there are many. Before I potentially mark it as undoable, maybe you can suggest me any overall tips, maybe someone tried similar task and you have some idea? Many thanks
Hi again @jyang-aws, rather than using rasa, I decided to use bare labse model and replace rasa code with custom code myself, because I really want to see and test the performance of inferentia servers so that we might go to production with them. I pulled the model from tensorflow hub, just loaded it and ran the example given in the model link. https://tfhub.dev/google/LaBSE/2 It works fine Then I tried to compile it with inferentia. I kept getting CPU issues. That it was getting over 100%. I started increasing the instance type, and even got to inf1.24xlarge, but even in that, I kept getting the CPU issues and eventually my compilation got stuck with this error message. Can you please recommend me an instance where I can compile labse? Thanks
2023-08-31 13:32:31.837863: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-31 13:32:31.838096: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-08-31 13:33:18.741540: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-31 13:33:18.741727: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-08-31 13:33:41.959279: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 9 ops of 4 different types in the graph that are not compiled by neuron-cc: GatherV2, OneHot, Placeholder, NoOp, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html). 2023-08-31 13:33:54.209354: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-08-31 13:33:54.209505: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-08-31 13:34:02.299414: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-08-31 13:34:02.756860: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var
MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
`
Hi @tahsintahsin
Could you try inf2 24xlarge or 48 xlarge? they have more vCPU and memory sizes, and newer version of the accelerator.
Hello @jyang-aws I just tried inf2 24xlarge, and got the issue below:
`2023-09-05 12:51:27.988503: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-09-05 12:51:27.988658: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-09-05 12:51:55.036299: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-09-05 12:51:55.036443: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-09-05 12:52:06.698203: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 8 ops of 3 different types in the graph that are not compiled by neuron-cc: OneHot, Placeholder, NoOp, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html). 2023-09-05 12:52:14.565598: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 2023-09-05 12:52:14.565768: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session 2023-09-05 12:52:17.704945: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-09-05 12:52:17.822686: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var 'MLIR_CRASH_REPRODUCER_DIRECTORY' to enable.
Compiler status PASS
AttributeError Traceback (most recent call last) Cell In[6], line 1 ----> 1 model_neuron = tfnx.trace(encoder, preprocessor(english_sentences))
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuronx/_trace.py:8, in trace(func, example_inputs, subgraph_builder_function) 6 def trace(func, example_inputs, subgraph_builder_function=None): 7 with _neuronx_cc_context(): ----> 8 return tfn_trace(func, example_inputs, subgraph_builder_function)
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:241, in trace(func, example_inputs, subgraph_builder_function) 238 model = AwsNeuronModel(cfunc, func.structured_outputs, real_op_count, ordered_weights=ordered_weights) 239 else: 240 # wrap GraphDef as a WrappedFunction --> 241 cfunc = _wrap_graph_def_as_concrete_function(graph_def, func) 242 # wrap ConcreteFunction as a keras model 243 model = AwsNeuronModel(cfunc, func.structured_outputs, real_op_count)
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:605, in _wrap_graph_def_as_concrete_function(graph_def, func_ref) 601 def _wrap_graph_def_as_concrete_function(graph_def, func_ref): 602 # Note: if input_names is a dictionary (such as '{ts.name: ts.name for ts in example_inputs}'), 603 # then the WrappedFunction may occationally have feeding tensors going to the wrong inputs. 604 input_names = _get_input_names(func_ref) --> 605 output_names = _get_output_names(func_ref) 606 cfunc = wrap_function.function_from_graph_def(graph_def, input_names, output_names) 608 # TODO: remove this hack once https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/eager/wrap_function.py#L377 is fixed
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:576, in _get_output_names(func) 572 if isinstance(structured_outputs, dict): 573 # return a map from output argument name to symbolic tensor name 574 # in order to let the WrappedFunction's return dictionary have the correct keys 575 tensor_specs = nest.flatten(structured_outputs, expand_composites=True) --> 576 tensor_spec_name_map = {spec.name: name for name, spec in structured_outputs.items()} 577 tensor_spec_names = [tensor_spec_name_map[spec.name] for spec in tensor_specs] 578 return {name: ts.name for ts, name in zip(outputs, tensor_spec_names)}
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:576, in
AttributeError: 'list' object has no attribute 'name'`
Should I try the largest, 48 x large, does it seem to be because of it ?
Another question is, I am not using --neuroncore-pipeline-cores arg because I am running from Jupyter notebook. How does it behave in that case ? Does the compiler use all available cores by default ?
Thanks
@tahsintahsin
This is a separate issue, not related to instance type you used.
Could you share with us a minimum test case to reproduce?
As to your question about --neuroncore-pipeline-cores
, you can try:
tfn.saved_model.compile(....
compiler_args = ['--neuroncore-pipeline-cores', '16'])
@jyang-aws Thanks for the help again, After installation of tensorflow_text with pip install tensorflow_text==2.10 (I think your version was 2.10, the matching version of tf text is getting installed with no issues) what I am running is:
import tensorflow_hub as hub import tensorflow as tf import tensorflow_text as text import numpy as np import tensorflow_neuronx as tfnx
def normalization(embeds): norms = np.linalg.norm(embeds, 2, axis=1, keepdims=True) return embeds/norms
english_sentences = tf.constant(["dog", "Puppies are nice.", "I enjoy taking long walks along the beach with my dog."]) italian_sentences = tf.constant(["cane", "I cuccioli sono carini.", "Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane."]) japanese_sentences = tf.constant(["犬", "子犬はいいです", "私は犬と一緒にビーチを散歩するのが好きです"])
preprocessor = hub.KerasLayer( "https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-preprocess/2") encoder = hub.KerasLayer("https://tfhub.dev/google/LaBSE/2")
english_embeds = encoder(preprocessor(english_sentences))["default"] japanese_embeds = encoder(preprocessor(japanese_sentences))["default"] italian_embeds = encoder(preprocessor(italian_sentences))["default"]
english_embeds = normalization(english_embeds) japanese_embeds = normalization(japanese_embeds) italian_embeds = normalization(italian_embeds)
print (np.matmul(english_embeds, np.transpose(italian_embeds)))
print (np.matmul(english_embeds, np.transpose(japanese_embeds)))
print (np.matmul(italian_embeds, np.transpose(japanese_embeds)))
neuron_labse = tfnx.trace(encoder,preprocessor(english_sentences))
@jyang-aws I have tried the same code with inf2.48xlarge and the output is as follows, compiler status is different but it has the same end error
2023-09-06 12:10:15.670511: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2023-09-06 12:10:15.670740: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-09-06 12:10:43.120439: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2023-09-06 12:10:43.120734: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-09-06 12:10:54.844171: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 9 ops of 4 different types in the graph that are not compiled by neuron-cc: GatherV2, OneHot, Placeholder, NoOp, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
2023-09-06 12:11:02.727643: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2023-09-06 12:11:02.727974: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2023-09-06 12:11:05.606759: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 12:11:05.727959: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
....
Compiler status ERROR
WARNING:tensorflow:neuron-cc failed with:
2023-09-06 12:13:00.761933: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 12:13:21.109072: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 12:13:21.110063: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 12:13:43.159410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: An Internal Compiler Error has occurred
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error message: /usr/local/lib/python3.10/dist-packages/tensorflow-plugins/libaws_neuron_plugin.so: undefined symbol: _ZTIN10tensorflow9AllocatorE
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error class: NotFoundError
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error location: Unknown
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Command line: /usr/local/bin/neuron-cc compile /tmp/tmpsp95qbbl/hlo_module.pb --framework XLA --verbose=35 --pipeline compile SaveTemps --output /tmp/tmpsp95qbbl/hlo_module.neff --fast-math=none --fp32-cast=matmult-fp16 --enable-fast-context-switch
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Internal details:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/CommandDriver.py", line 224, in neuroncc.driver.CommandDriver.CommandDriver.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 580, in neuroncc.driver.commands.CompileCommand.CompileCommand.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 558, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 562, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 431, in neuroncc.driver.jobs.Frontend.Frontend.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 210, in neuroncc.driver.jobs.Frontend.Frontend.runXLAFrontend
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/support/Frameworks.py", line 1013, in neuroncc.driver.jobs.support.Frameworks.XLAInterface.loadModel
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 441, in
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: HWM version 1.15.0.0-ea99c2b8f
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: NEFF version Dynamic
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: TVM version 1.17.0.0+0
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: NumPy version 1.23.5
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: MXNet not available
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: TF not available
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Artifacts stored in: /tmp/tmpsp95qbbl
WARNING:tensorflow:neuron-cc failed with:
2023-09-06 12:13:00.761933: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 12:13:21.109072: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-09-06 12:13:21.110063: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 12:13:43.159410: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: An Internal Compiler Error has occurred
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error message: /usr/local/lib/python3.10/dist-packages/tensorflow-plugins/libaws_neuron_plugin.so: undefined symbol: _ZTIN10tensorflow9AllocatorE
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error class: NotFoundError
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Error location: Unknown
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Command line: /usr/local/bin/neuron-cc compile /tmp/tmpsp95qbbl/hlo_module.pb --framework XLA --verbose=35 --pipeline compile SaveTemps --output /tmp/tmpsp95qbbl/hlo_module.neff --fast-math=none --fp32-cast=matmult-fp16 --enable-fast-context-switch
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Internal details:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/CommandDriver.py", line 224, in neuroncc.driver.CommandDriver.CommandDriver.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 580, in neuroncc.driver.commands.CompileCommand.CompileCommand.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 558, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/commands/CompileCommand.py", line 562, in neuroncc.driver.commands.CompileCommand.CompileCommand.runPipeline
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 431, in neuroncc.driver.jobs.Frontend.Frontend.runSingleInput
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/Frontend.py", line 210, in neuroncc.driver.jobs.Frontend.Frontend.runXLAFrontend
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "neuroncc/driver/jobs/support/Frameworks.py", line 1013, in neuroncc.driver.jobs.support.Frameworks.XLAInterface.loadModel
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 441, in
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: HWM version 1.15.0.0-ea99c2b8f
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: NEFF version Dynamic
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: TVM version 1.17.0.0+0
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: NumPy version 1.23.5
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: MXNet not available
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: TF not available
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]:
09/06/2023 12:13:59 PM ERROR 17181 [neuron-cc]: Artifacts stored in: /tmp/tmpsp95qbbl
AttributeError Traceback (most recent call last) Cell In[11], line 1 ----> 1 model_neuron = tfn.trace(encoder, preprocessor(english_sentences))
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:241, in trace(func, example_inputs, subgraph_builder_function) 238 model = AwsNeuronModel(cfunc, func.structured_outputs, real_op_count, ordered_weights=ordered_weights) 239 else: 240 # wrap GraphDef as a WrappedFunction --> 241 cfunc = _wrap_graph_def_as_concrete_function(graph_def, func) 242 # wrap ConcreteFunction as a keras model 243 model = AwsNeuronModel(cfunc, func.structured_outputs, real_op_count)
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:605, in _wrap_graph_def_as_concrete_function(graph_def, func_ref)
601 def _wrap_graph_def_as_concrete_function(graph_def, func_ref):
602 # Note: if input_names is a dictionary (such as {ts.name: ts.name for ts in example_inputs}
),
603 # then the WrappedFunction may occationally have feeding tensors going to the wrong inputs.
604 input_names = _get_input_names(func_ref)
--> 605 output_names = _get_output_names(func_ref)
606 cfunc = wrap_function.function_from_graph_def(graph_def, input_names, output_names)
608 # TODO: remove this hack once https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/eager/wrap_function.py#L377 is fixed
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:576, in _get_output_names(func) 572 if isinstance(structured_outputs, dict): 573 # return a map from output argument name to symbolic tensor name 574 # in order to let the WrappedFunction's return dictionary have the correct keys 575 tensor_specs = nest.flatten(structured_outputs, expand_composites=True) --> 576 tensor_spec_name_map = {spec.name: name for name, spec in structured_outputs.items()} 577 tensor_spec_names = [tensor_spec_name_map[spec.name] for spec in tensor_specs] 578 return {name: ts.name for ts, name in zip(outputs, tensor_spec_names)}
File ~/aws_neuron_venv_tensorflow/lib/python3.10/site-packages/tensorflow_neuron/python/_trace.py:576, in
AttributeError: 'list' object has no attribute 'name'
@jyang-aws any updates on the issue? thanks
@tahsintahsin, we have reproduced your issue on tensorflow-neuronx and will investigate. Thanks!
@jeffhataws Hello again, any updates on the issue ? We are willing to use inferentia in our production system if possible, so waiting for your updates
Thanks for checking back @tahsintahsin.
It appears your model's output has a dictionary where the values are lists which we do not currently support. We may work on supporting this in the future. You can unblock yourself by wrapping the model so that it returns a list instead of a dictionary like so:
class NeuronEncoderWrapper(tf.keras.Model):
def __init__(self, model):
super().__init__()
self.model = model
def __call__(self, example_inputs):
intermediate = self.model(example_inputs)
return [intermediate['encoder_outputs'], intermediate['default'], intermediate['pooled_output'], intermediate['sequence_output']]
...
wrapped_encoder = NeuronEncoderWrapper(encoder)
neuron_labse = tfnx.trace(wrapped_encoder, example_input)
For your reference I have also made the modifications to your original script, which you can test.
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_text as text
import numpy as np
import tensorflow_neuronx as tfnx
def normalization(embeds):
norms = np.linalg.norm(embeds, 2, axis=1, keepdims=True)
return embeds/norms
english_sentences = tf.constant(["dog", "Puppies are nice.", "I enjoy taking long walks along the beach with my dog."])
italian_sentences = tf.constant(["cane", "I cuccioli sono carini.", "Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane."])
japanese_sentences = tf.constant(["犬", "子犬はいいです", "私は犬と一緒にビーチを散歩するのが好きです"])
preprocessor = hub.KerasLayer(
"https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-preprocess/2")
encoder = hub.KerasLayer("https://tfhub.dev/google/LaBSE/2")
class NeuronEncoderWrapper(tf.keras.Model):
def __init__(self, model):
super().__init__()
self.model = model
def __call__(self, example_inputs):
intermediate = self.model(example_inputs)
return [intermediate['encoder_outputs'], intermediate['default'], intermediate['pooled_output'], intermediate['sequence_output']]
english_embeds = encoder(preprocessor(english_sentences))["default"]
japanese_embeds = encoder(preprocessor(japanese_sentences))["default"]
italian_embeds = encoder(preprocessor(italian_sentences))["default"]
english_embeds = normalization(english_embeds)
japanese_embeds = normalization(japanese_embeds)
italian_embeds = normalization(italian_embeds)
print (np.matmul(english_embeds, np.transpose(italian_embeds)))
print (np.matmul(english_embeds, np.transpose(japanese_embeds)))
print (np.matmul(italian_embeds, np.transpose(japanese_embeds)))
example_input = preprocessor(english_sentences)
wrapped_encoder = NeuronEncoderWrapper(encoder)
neuron_labse = tfnx.trace(wrapped_encoder, example_input)
for i in range(1000):
print(neuron_labse(example_input))
Hello @jeffhataws Many Thanks this time it worked :)
Hello I am working in an inf1 instance, using ubuntu 22. I followed the instructions in the documentation and installed drivers and sdk. I got no errors on this section. Following the original instructions, it creates a venv and installs Jupyter notebook in it. I again followed the rest of the tutorial and run the Jupyter notebook command, but there was no link appearing in the terminal output that I can connect to remotely from my local. Then I thought it might be venv issue and did all the steps again and installed all globally and ran Jupyter notebook from terminal again, not within a venv this time. However, the output is the same and there is no url appearing as shown in the documentation. There was a troubleshooting section for Jupyter notebook connection, I tried that as well and that is not helping either. How can I proceed now ? Thanks