WeijingShi / Point-GNN

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, CVPR 2020.
MIT License
523 stars 114 forks source link

Traceback (most recent call last): File "train.py", line 648, in <module> global_step=results['step']) NameError: name 'results' is not defined #59

Open r-sy opened 3 years ago

r-sy commented 3 years ago

When I set the NUM_GPU=4 and batch_size=8 of train config files,why the following problems will occur? Traceback (most recent call last): File "train.py", line 648, in <module> global_step=results['step']) NameError: name 'results' is not defined

The complete output is as follows:

python3 train.py configs/car_auto_T3_train_train_config configs/car_auto_T3_train_config WARNING:tensorflow: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

WARNING:tensorflow:From train.py:196: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:128: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:128: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

@ level 0 Graph, Add layer: layer1, type: scatter_max_point_set_pooling WARNING:tensorflow:From /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:1866: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version. Instructions for updating: Please use layer.__call__ method instead. Feature Dim:300 @ level 1 Graph, Add layer: layer2, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer3, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer4, type: scatter_max_graph_auto_center_net Feature Dim:300 Final Feature Dim:300 Prediction 4 classes WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:241: The name tf.losses.huber_loss is deprecated. Please use tf.compat.v1.losses.huber_loss instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:246: The name tf.losses.Reduction is deprecated. Please use tf.compat.v1.losses.Reduction instead.

(?, 1, 7) WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:258: The name tf.div_no_nan is deprecated. Please use tf.math.divide_no_nan instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:263: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:266: The name tf.is_nan is deprecated. Please use tf.math.is_nan instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:309: The name tf.assert_equal is deprecated. Please use tf.compat.v1.assert_equal instead.

WARNING:tensorflow:From /home/wrjs/pc/science/Point-GNN/models/models.py:311: The name tf.losses.get_regularization_losses is deprecated. Please use tf.compat.v1.losses.get_regularization_losses instead.

@ level 0 Graph, Add layer: layer1, type: scatter_max_point_set_pooling Feature Dim:300 @ level 1 Graph, Add layer: layer2, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer3, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer4, type: scatter_max_graph_auto_center_net Feature Dim:300 Final Feature Dim:300 Prediction 4 classes (?, 1, 7) @ level 0 Graph, Add layer: layer1, type: scatter_max_point_set_pooling Feature Dim:300 @ level 1 Graph, Add layer: layer2, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer3, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer4, type: scatter_max_graph_auto_center_net Feature Dim:300 Final Feature Dim:300 Prediction 4 classes (?, 1, 7) @ level 0 Graph, Add layer: layer1, type: scatter_max_point_set_pooling Feature Dim:300 @ level 1 Graph, Add layer: layer2, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer3, type: scatter_max_graph_auto_center_net Feature Dim:300 @ level 1 Graph, Add layer: layer4, type: scatter_max_graph_auto_center_net Feature Dim:300 Final Feature Dim:300 Prediction 4 classes (?, 1, 7) Set to unify copies in different GPU as if its a single copy WARNING:tensorflow:From train.py:309: The name tf.metrics.mean is deprecated. Please use tf.compat.v1.metrics.mean instead.

WARNING:tensorflow:From train.py:326: The name tf.metrics.recall is deprecated. Please use tf.compat.v1.metrics.recall instead.

WARNING:tensorflow:From /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/metrics_impl.py:2200: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From train.py:334: The name tf.metrics.precision is deprecated. Please use tf.compat.v1.metrics.precision instead.

WARNING:tensorflow:From train.py:342: The name tf.metrics.auc is deprecated. Please use tf.compat.v1.metrics.auc instead.

WARNING:tensorflow:From train.py:377: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

WARNING:tensorflow:From train.py:381: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From train.py:381: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From train.py:383: The name tf.train.GradientDescentOptimizer is deprecated. Please use tf.compat.v1.train.GradientDescentOptimizer instead.

WARNING:tensorflow:From train.py:384: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

WARNING:tensorflow:From train.py:385: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.

WARNING:tensorflow:From train.py:386: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

/home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " batch size=8 WARNING:tensorflow:From train.py:503: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From train.py:505: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From train.py:508: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

WARNING:tensorflow:From train.py:521: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From train.py:522: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2020-12-19 12:36:06.886850: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-12-19 12:36:06.922455: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz 2020-12-19 12:36:06.926453: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5c9df00 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-12-19 12:36:06.926512: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-12-19 12:36:06.931199: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-12-19 12:36:08.595825: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5c3c040 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-12-19 12:36:08.595879: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595895: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595907: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595919: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595933: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (4): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595945: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (5): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595956: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (6): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.595968: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (7): GeForce RTX 2080 Ti, Compute Capability 7.5 2020-12-19 12:36:08.601628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:1a:00.0 2020-12-19 12:36:08.603147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:1b:00.0 2020-12-19 12:36:08.604617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:1d:00.0 2020-12-19 12:36:08.606050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:1e:00.0 2020-12-19 12:36:08.607494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 4 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:3d:00.0 2020-12-19 12:36:08.608915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 5 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:3e:00.0 2020-12-19 12:36:08.610340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 6 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:40:00.0 2020-12-19 12:36:08.611823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 7 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:41:00.0 2020-12-19 12:36:08.612259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2020-12-19 12:36:08.614164: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2020-12-19 12:36:08.615786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2020-12-19 12:36:08.616236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2020-12-19 12:36:08.618413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2020-12-19 12:36:08.620010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2020-12-19 12:36:08.625166: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-12-19 12:36:08.648752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7 2020-12-19 12:36:08.648897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2020-12-19 12:36:08.661353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-12-19 12:36:08.661411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1 2 3 4 5 6 7 2020-12-19 12:36:08.661426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N N N N N N N 2020-12-19 12:36:08.661439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N N N N N N N 2020-12-19 12:36:08.661450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2: N N N N N N N N 2020-12-19 12:36:08.661461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 3: N N N N N N N N 2020-12-19 12:36:08.661471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 4: N N N N N N N N 2020-12-19 12:36:08.661482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 5: N N N N N N N N 2020-12-19 12:36:08.661493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 6: N N N N N N N N 2020-12-19 12:36:08.661503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 7: N N N N N N N N 2020-12-19 12:36:08.675250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:1a:00.0, compute capability: 7.5) 2020-12-19 12:36:08.677212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:1b:00.0, compute capability: 7.5) 2020-12-19 12:36:08.679328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10312 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:1d:00.0, compute capability: 7.5) 2020-12-19 12:36:08.682429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10312 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:1e:00.0, compute capability: 7.5) 2020-12-19 12:36:08.685735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 10312 MB memory) -> physical GPU (device: 4, name: GeForce RTX 2080 Ti, pci bus id: 0000:3d:00.0, compute capability: 7.5) 2020-12-19 12:36:08.688763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 10312 MB memory) -> physical GPU (device: 5, name: GeForce RTX 2080 Ti, pci bus id: 0000:3e:00.0, compute capability: 7.5) 2020-12-19 12:36:08.691735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 10312 MB memory) -> physical GPU (device: 6, name: GeForce RTX 2080 Ti, pci bus id: 0000:40:00.0, compute capability: 7.5) 2020-12-19 12:36:08.693986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 10312 MB memory) -> physical GPU (device: 7, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5) WARNING:tensorflow:From train.py:524: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From train.py:524: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Restore from checkpoint ./checkpoints/car_auto_T3_train/model-1400004 WARNING:tensorflow:From /home/wrjs/.local/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file utilities to get mtimes. WARNING:tensorflow:From train.py:532: The name tf.local_variables is deprecated. Please use tf.compat.v1.local_variables instead.

Traceback (most recent call last): File "train.py", line 662, in global_step=results['step']) NameError: name 'results' is not defined

How can I increase the batch size and utilize more GPUs? Thank you!

WeijingShi commented 3 years ago

Hi @r-sy, the step in the checkpoint has already reached max_step in the train_config, the train loop is skipped (no results variable) and thus gives this error. You can just increase the max_step and max_epoch in the train_config if you want to fine-tune the checkpoint or you can training from the beginning. Note the training would end what either the max_step or the max_epoch is hit, and step*batch_size=num_sample. Hope it helps.

r-sy commented 3 years ago

Thank you for your guidance, which, like my understanding, was very helpful! Thank you for your excellent work!