Hi, I am trying to run example classification.ipynb in Orin.
When I'm trying to run example(noting changed), build_classification_graph has error.
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
What should I do now? Should I changed ckpt file?
Detail
Environment(HW) & using L4T-ML docker (dustynv/l4t-ml:r35.4.1)
2023-11-03 08:19:47.515183: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.515480: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: Orin major: 8 minor: 7 memoryClockRate(GHz): 1.3
pciBusID: 0000:00:00.0
2023-11-03 08:19:47.515618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-11-03 08:19:47.515738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-11-03 08:19:47.515801: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-11-03 08:19:47.515854: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-11-03 08:19:47.515901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-11-03 08:19:47.515947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-11-03 08:19:47.515990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-11-03 08:19:47.516230: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.516488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.516645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2023-11-03 08:19:47.516764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-11-03 08:19:47.516819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2023-11-03 08:19:47.516865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2023-11-03 08:19:47.517035: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.517316: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1049] ARM64 does not support NUMA - returning NUMA node zero
2023-11-03 08:19:47.517584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20692 MB memory) -> physical GPU (device: 0, name: Orin, pci bus id: 0000:00:00.0, compute capability: 8.7)
INFO:tensorflow:Restoring parameters from data/inception_v2/inception_v2.ckpt
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1365, in BaseSession._do_call(self, fn, *args)
1364 try:
-> 1365 return fn(*args)
1366 except errors.OpError as e:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1349, in BaseSession._do_run.<locals>._run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1348 self._extend_graph()
-> 1349 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
1350 target_list, run_metadata)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1441, in BaseSession._call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1439 def _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list,
1440 run_metadata):
-> 1441 return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
1442 fetch_list, target_list,
1443 run_metadata)
NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[{{node save/RestoreV2}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1289, in Saver.restore(self, sess, save_path)
1288 else:
-> 1289 sess.run(self.saver_def.restore_op_name,
1290 {self.saver_def.filename_tensor_name: save_path})
1291 except errors.NotFoundError as err:
1292 # There are three common conditions that might cause this error:
1293 # 0. The file is missing. We ignore here, as this is checked above.
(...)
1297 # 1. The checkpoint would not be loaded successfully as is. Try to parse
1298 # it as an object-based checkpoint.
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:955, in BaseSession.run(self, fetches, feed_dict, options, run_metadata)
954 try:
--> 955 result = self._run(None, fetches, feed_dict, options_ptr,
956 run_metadata_ptr)
957 if run_metadata:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1179, in BaseSession._run(self, handle, fetches, feed_dict, options, run_metadata)
1178 if final_fetches or final_targets or (handle and feed_dict_tensor):
-> 1179 results = self._do_run(handle, final_targets, final_fetches,
1180 feed_dict_tensor, options, run_metadata)
1181 else:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1358, in BaseSession._do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1357 if handle is None:
-> 1358 return self._do_call(_run_fn, feeds, fetches, targets, options,
1359 run_metadata)
1360 else:
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py:1384, in BaseSession._do_call(self, fn, *args)
1380 message += ('\nA possible workaround: Try disabling Grappler optimizer'
1381 '\nby modifying the config for creating the session eg.'
1382 '\nsession_config.graph_options.rewrite_options.'
1383 'disable_meta_optimizer = True')
-> 1384 raise type(e)(node_def, op, message)
NotFoundError: 2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 976, in launch_instance
app.start()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelapp.py", line 712, in start
self.io_loop.start()
File "usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request
reply_content = await reply_content
File "usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute
res = shell.run_cell(
File "usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell
return super().run_cell(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_cell
result = self._run_cell(
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2936, in _run_cell
return runner(coro)
File "usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3135, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3338, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "tmp/ipykernel_8476/3250917203.py", line 1, in <cell line: 1>
frozen_graph, input_names, output_names = build_classification_graph(
File "home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py", line 208, in build_classification_graph
tf_saver = tf.train.Saver()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1300, in Saver.restore(self, sess, save_path)
1299 try:
-> 1300 names_to_keys = object_graph_key_mapping(save_path)
1301 except errors.NotFoundError:
1302 # 2. This is not an object-based checkpoint, which likely means there
1303 # is a graph mismatch. Re-raise the original error with
1304 # a helpful message (b/110263146)
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1618, in object_graph_key_mapping(checkpoint_path)
1617 reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
-> 1618 object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
1619 object_graph_proto = (trackable_object_graph_pb2.TrackableObjectGraph())
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py:915, in CheckpointReader.get_tensor(self, tensor_str)
913 from tensorflow.python.util import compat
--> 915 return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
NotFoundError: _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint file
During handling of the above exception, another exception occurred:
NotFoundError Traceback (most recent call last)
Input In [26], in <cell line: 1>()
----> 1 frozen_graph, input_names, output_names = build_classification_graph(
2 model=MODEL,
3 checkpoint=checkpoint_path,
4 num_classes=NUM_CLASSES
5 )
File /home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py:209, in build_classification_graph(model, checkpoint, num_classes)
207 # load checkpoint
208 tf_saver = tf.train.Saver()
--> 209 tf_saver.restore(save_path=checkpoint, sess=tf_sess)
211 # freeze graph
212 frozen_graph = tf.graph_util.convert_variables_to_constants(
213 tf_sess,
214 tf_sess.graph_def,
215 output_node_names=[output_name]
216 )
File /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py:1305, in Saver.restore(self, sess, save_path)
1300 names_to_keys = object_graph_key_mapping(save_path)
1301 except errors.NotFoundError:
1302 # 2. This is not an object-based checkpoint, which likely means there
1303 # is a graph mismatch. Re-raise the original error with
1304 # a helpful message (b/110263146)
-> 1305 raise _wrap_restore_error_with_msg(
1306 err, "a Variable name or other graph key that is missing")
1308 # This is an object-based checkpoint. We'll print a warning and then do
1309 # the restore.
1310 logging.warning(
1311 "Restoring an object-based checkpoint using a name-based saver. This "
1312 "may be somewhat fragile, and will re-build the Saver. Instead, "
1313 "consider loading object-based checkpoints using "
1314 "tf.train.Checkpoint().")
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_133]]
(1) Not found: Tensor name "InceptionV2/Conv2d_2b_1x1/biases" not found in checkpoint files data/inception_v2/inception_v2.ckpt
[[node save/RestoreV2 (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 976, in launch_instance
app.start()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelapp.py", line 712, in start
self.io_loop.start()
File "usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue
await self.process_one()
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 499, in process_one
await dispatch(*args)
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell
await result
File "usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request
reply_content = await reply_content
File "usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute
res = shell.run_cell(
File "usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell
return super().run_cell(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_cell
result = self._run_cell(
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 2936, in _run_cell
return runner(coro)
File "usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3135, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3338, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "tmp/ipykernel_8476/3250917203.py", line 1, in <cell line: 1>
frozen_graph, input_names, output_names = build_classification_graph(
File "home/adas/Repo/working/tf_trt_models/examples/classification/tf_trt_models/classification.py", line 208, in build_classification_graph
tf_saver = tf.train.Saver()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 868, in _build
self.saver_def = self._builder._build_internal( # pylint: disable=protected-access
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 507, in _build_internal
restore_op = self._AddRestoreOps(filename_tensor, saveables,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 327, in _AddRestoreOps
all_tensors = self.bulk_restore(filename_tensor, saveables, preferred_shard,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1693, in restore_v2
_, _, _op = _op_def_lib._apply_op_helper(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
ret = Operation(
File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Hi, I am trying to run example classification.ipynb in Orin.
When I'm trying to run example(noting changed),
build_classification_graph
has error.What should I do now? Should I changed ckpt file?
Detail
Environment(HW) & using L4T-ML docker (dustynv/l4t-ml:r35.4.1)
Code
Full Erorr