Open ars359 opened 3 years ago
Hi @ars359, The code caught the localization loss to be NaN. Could you try with a small learning rate and see it still occurs? If it does, another possible source might be the custom data that generates an empty point cloud. You might disable the random shuffle in the data loader and locate the sample_idx that is causing the problem. Just another thought, have you try training on a single GPU?
Hi @ars359 . I have some custom point clouds which I want to use for my training. Can you guide me on how to proceed with this? I cannot find any proper guidance on how to work with custom datasets.
I am looking to train on a custom dataset as well. But can't find any documentation around that. @WeijingShi - if you have outlined something in a doc/talk/gitpage/blog -- anything that helps remotely, please drop it in the comments. I will be forever grateful to you. Thanks.
Hi @aastha3,
Sorry for the late reply.
The fast way may be to mimic the Kitti dataset and prepare your data in the same way. Essentially, we need the point cloud bin file, images (used in visualization), labels text files, and calibration files. The KITTI website has sample data and I found the readme file in the toolkit clarifies things a lot.
For a slightly deeper look, we just need point cloud and the labels in the same coordinate frame, in the Kitti dataset, we read the points here: https://github.com/WeijingShi/Point-GNN/blob/48f3d79d5b101d3a4b8439ba74c92fcad4f7cab0/dataset/kitti_dataset.py#L666
This function basically reads the point cloud file and does some coordinate transformation to make sure that the points are in the same coordinate system as the labels. So later, we just read label file https://github.com/WeijingShi/Point-GNN/blob/48f3d79d5b101d3a4b8439ba74c92fcad4f7cab0/train.py#L81
and use label files to annotate the points https://github.com/WeijingShi/Point-GNN/blob/48f3d79d5b101d3a4b8439ba74c92fcad4f7cab0/train.py#L110
Hope it helps, Weijing
After some modifications in kitti_dataset.py file to take custom data as input this error I am facing while training. Using 2080Ti 2 stacks for training.
Traceback (most recent call last): File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x == y did not hold element-wise:] [x (IsNan_9:0) = ] [1] [y (assert_equal_1/y:0) = ] [0] [[{{node assert_equal_1/Assert/AssertGuard/Assert}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "train.py", line 598, in
results = sess.run(fetches, feed_dict=total_feed_dict)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x == y did not hold element-wise:] [x (IsNan_9:0) = ] [1] [y (assert_equal_1/y:0) = ] [0]
[[node assert_equal_1/Assert/AssertGuard/Assert (defined at /home/aeye/Point-GNN-master/models/models.py:309) ]]
Original stack trace for 'assert_equal_1/Assert/AssertGuard/Assert': File "train.py", line 251, in
t_loss_dict = model.loss(t_logits, t_class_labels, t_pred_box,t_encoded_gt_boxes, t_valid_gt_boxes, config['loss'])
File "/home/aeye/Point-GNN-master/models/models.py", line 309, in loss
False)]):
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 557, in assert_equal
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 193, in wrapped
return _add_should_use_warning(fn(*args, *kwargs))
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 171, in Assert
guarded_assert = cond(condition, no_op, true_assert, name="AssertGuard")
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(args, kwargs)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1988, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in BuildCondBranch
original_result = fn()
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 169, in true_assert
condition, data, summarize, name="Assert")
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 74, in _assert
name=name)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/home/aeye/Documents/virtual_Aeye/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()