mostafa501 commented 1 year ago

Hey @filaPro, thank you for publishing this work. I see that you have a custom dataloader for the solution, but I was curious how to use this network on my own data. For example, say I have a PCD for chairs, I do not have the ground truth for this. I just want to run inference script to give me a list of detected objects. How would be the best way to progress using your network? Any help on this would be very much appreciated. Thank you for the contribution !!

filaPro commented 1 year ago

Hi @mostafa501 ,

I think, you can start from demo script.

mostafa501 commented 1 year ago

Hello dr. filaPro, thank you for your pevious replies, i installed all the required libraries and run this script for detection one of the uploaded data.

the script used is:

from argparse import ArgumentParser from mmdet3d.apis import inference_detector, init_model from mmdet3d.apis import show_result_meshlab

def main(): parser = ArgumentParser() parser.add_argument('--pcd', default='/media/navlab/GNSS/work/detection/tr3d-main/demo/data/scannet/scene0000_00.bin', help='Point cloud file') #'/media/navlab/GNSS/work/detection/tr3d-main/indoor_data/table.bin parser.add_argument('--config', default='/media/navlab/GNSS/work/detection/tr3d-main/tr3d/configs/tr3d/tr3d_scannet- pretrain_s3dis-3d-5class.py', help='Config file') parser.add_argument('--checkpoint', default='/media/navlab/GNSS/work/detection/tr3d-main/pretrained_models/3d_detetction/scannet/tr3d_scannet.pth', help='Checkpoint file') parser.add_argument('--device', default='cuda:0', help='Device used for inference') parser.add_argument('--score-thr', type=float, default=0.0, help='bbox score threshold') parser.add_argument('--out-dir', type=str, default='/media/navlab/GNSS/work/detection/tr3d-main/evaluate_results/try1', help='dir to save results') parser.add_argument('--show',action='store_true',help='show online visualization results') parser.add_argument('--snapshot',action='store_true',help='whether to save online visualization results') args = parser.parse_args()

build the model from a config file and a checkpoint file

model = init_model(args.config, args.checkpoint, device=args.device)
result, data = inference_detector(model, args.pcd)
print (result)

show the results

show_result_meshlab(data,result,args.out_dir,args.score_thr,show=args.show,snapshot=args.snapshot,task='det')

if name == 'main': main()

The output of the script is: 2023-11-09 18:37:21,635 - mmcv - INFO - backbone.conv1.kernel - torch.Size([27, 3, 64]): Initialized by user-defined init_weights in MinkResNet

2023-11-09 18:37:21,637 - mmcv - INFO - backbone.norm1.bn.weight - torch.Size([64]): The value is the same before and after calling init_weights of MinkSingleStage3DDetector

2023-11-09 18:37:21,638 - mmcv - INFO - backbone.norm1.bn.bias - torch.Size([64]): The value is the same before and after calling init_weights of MinkSingleStage3DDetector

22023-11-09 18:37:21,638 - mmcv - INFO - backbone.norm1.bn.bias - torch.Size([64]): The value is the same before and after calling init_weights of MinkSingleStage3DDetector

2023-11-09 18:37:21,640 - mmcv - INFO - backbone.layer1.0.conv1.kernel - torch.Size([27, 64, 64]): Initialized by user-defined init_weights in MinkResNet

2023-11-09 18:37:21,641 - mmcv - INFO - backbone.layer1.0.norm1.bn.weight - torch.Size([64]): The value is the same before and after calling init_weights of MinkSingleStage3DDetector

2023-11-09 18:37:21,643 - mmcv - INFO - ... 2023-11-09 18:37:21,746 - mmcv - INFO - head.cls_conv.bias - torch.Size([1, 18]): Initialized by user-defined init_weights in TR3DHead

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings... load checkpoint from local path: ../pretrained_models/3d_detetction/scannet/tr3d_scannet.pth

[{'boxes_3d': DepthInstance3DBoxes( tensor([[ 6.0211, 1.7779, 1.5086, ..., 0.8153, 0.8996, 0.0000], [ 0.6758, 5.7731, -0.0303, ..., 1.6858, 2.4479, 0.0000], [ 6.2168, 7.6608, 1.4575, ..., 1.8187, 1.0176, 0.0000], ..., [ 7.3683, 5.3746, 0.0740, ..., 0.8860, 2.0463, 0.0000], [ 6.0307, 1.9000, 1.4363, ..., 0.6912, 1.0217, 0.0000], [ 1.1401, 5.7307, 0.0201, ..., 2.3319, 2.4017, 0.0000]])), 'scores_3d': tensor([0.7522, 0.6948, 0.5754, 0.5611, 0.2757, 0.1960, 0.1668, 0.1429, 0.1053, 0.0958, 0.0899, 0.0850, 0.0717, 0.0638, 0.0526, 0.0519, 0.0500, 0.0499, 0.0488, 0.0484, 0.0392, 0.0343, 0.0323, 0.0315, 0.0309, 0.0263, 0.0225, 0.0224, 0.0210, 0.0161, 0.0160, 0.0157, 0.0137, 0.0131, 0.0122, 0.0101, 0.7866, 0.0692, 0.0646, 0.0602, 0.0429, 0.0319, 0.0295, 0.0284, 0.0264, 0.0250, 0.0244, 0.0219, 0.0211, 0.0179, 0.0163, 0.0159, 0.0148, 0.0146, 0.0142, 0.0140, 0.0139, 0.0131, 0.0117, 0.0108, 0.0102, 0.0101, 0.0101, 0.1795, 0.1502, 0.0572, 0.0392, 0.0352, 0.0341, 0.0317, 0.0296, 0.0269, 0.0261, 0.0259, 0.0254, 0.0251, 0.0245, 0.0230, 0.0198, 0.0193, 0.0134, 0.0131, 0.0129, 0.0122, 0.0119, 0.0117, 0.0116, 0.0113, 0.8036, 0.0369, 0.0347, 0.0299, 0.0294, 0.0195, 0.0146, 0.0145, 0.0144, 0.0144, 0.0124, 0.0121, 0.0121, 0.0119, 0.0104, 0.0103, 0.6613, 0.6481, 0.1025, 0.0808, 0.0748, 0.0638, 0.0342, 0.0235, 0.0153, 0.0152, 0.0124, 0.0103, 0.7589, 0.6781, 0.6632, 0.1835, 0.1465, 0.1070, 0.0972, 0.0803, 0.0629, 0.0626, 0.0619, 0.0613, 0.0558, 0.0518, 0.0517, 0.0378, 0.0376, 0.0374, 0.0367, 0.0356, 0.0340, 0.0337, 0.0304, 0.0302, 0.0297, 0.0281, 0.0277, 0.0251, 0.0247, 0.0245, 0.0209, 0.0194, 0.0178, 0.0122, 0.0100, 0.3147, 0.2103, 0.2061, 0.1426, 0.0615, 0.0555, 0.0547, 0.0510, 0.0487, 0.0482, 0.0419, ... 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17])}] Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

And the output are 2 obj files also (scene0000_00_points.obj & scene0000_00_pred.obj): Good visulaization, but they give the object file for all detected classes in one time as shown in image. How can i extract each detected cluster in separe file, so we can benefit from the network like the paper figure.2 ?. Thank you again.

obj output

fig2

filaPro commented 1 year ago

First, please set score_thr in the config to smth like 0.3 as mentioned in our readme, to remove detections with low confidence. Then manage your meshlab setting like fcaf3d/issues/31#issuecomment-1108180811 for nice vizalizations.

mostafa501 commented 1 year ago

OK. Thank you for these guides,I am really sorry for long discussion. I tried to test the model with given configs and pretrained models using point cloud file in scannet demo. I tried to use s3dis and scannet files as in next lines of code with score-thr (0, 0.2, and 0.3). """ parser = ArgumentParser() parser.add_argument('--pcd', default='/media/navlab/GNSS/work/detection/tr3d main/demo/data/scannet/scene0000_00.bin', help='Point cloud file') parser.add_argument('--config', default='/media/navlab/GNSS/work/detection/tr3d-main/tr3d/configs/tr3d/tr3d_scannet-3d- 18class.py', help='Config file') # tr3d_s3dis-3d-5class.py
parser.add_argument('--checkpoint', default='/media/navlab/GNSS/work/detection/tr3d- main/pretrained_models/3d_detetction/scannet/tr3d_scannet.pth', help='Checkpoint file') # tr3d_s3dis.pth parser.add_argument('--score-thr', type=float, default=0.0, help='bbox score threshold') """

It is noticed that when score-thr = 0.3, the results became much clear, however as in the next image, i noticed that all the bounding box are in same category for all objects (not automatically separate objects like table, chair, bookcase,,,), as you told me it needs manual work to separate them into categories which it is not required.

obj_result

I just want to use your network to take the point clouds of same class (like chairs or tables) and gives separate clusters of the same class (even each cluster point clouds or each cluster bounding box), exactly like this image. Could you advice me how can I do that ? Thank you

NEEDED

filaPro commented 1 year ago

I don't quite understand your question. We predict 1 box for each chair, so you don't need to separate anything. Have you tried to run the script on a single ScanNet scene? Also I noticed that in your points cloud the walls of the room are not parallel to the x and y axis. It will be a little bit better if you rotate your point cloud to fix it.

mostafa501 commented 1 year ago

OK, I understand you, that's clear. Yes I tried to test the model on single scannet in the demo data called (scene0000_00.bin). I used this demo data to be parallel to the x, y axes, but i dont know why they apeared not parallel. as herecthe code i used for test: """ parser = ArgumentParser() parser.add_argument('--pcd', default='/media/navlab/GNSS/work/detection/tr3d main/demo/data/scannet/scene0000_00.bin', help='Point cloud file') parser.add_argument('--config', default='/media/navlab/GNSS/work/detection/tr3d-main/tr3d/configs/tr3d/tr3d_scannet-3d- 18class.py', help='Config file') # tr3d_s3dis-3d-5class.py parser.add_argument('--checkpoint', default='/media/navlab/GNSS/work/detection/tr3d- main/pretrained_models/3d_detetction/scannet/tr3d_scannet.pth', help='Checkpoint file') # tr3d_s3dis.pth parser.add_argument('--score-thr', type=float, default=0.0, help='bbox score threshold') """

Just I have only 2 aueation: 1-for resulted bounding boxes, how can I distinguish between bounding boxes of chiars and tables, and so on ?, as i see that the model gives class label with each point. 2-if the model gives bounding boxes of each chair, can you tell me, how can I extarct the point cloud of each chair ? to be used for further processing. Thank you so much @filaPro for your replies.

filaPro commented 1 year ago

0) Probably demo script ignores rotation matrix here.

1) What do you mean by each point? Our model returns a single box, class label and probability score for each object.

2) Just crop the point cloud by the boundaries of predicted bounding box?

mostafa501 commented 1 year ago

Thank you so much, @filaPro, for your help. That's good, I caught it. Each object is right, not each point. Here, could you please tell me: How do I extract this information? where, I got only two OBJ files when I applied the demo script. (Information here refers to the single box, class label, and probability score for each object).

filaPro commented 1 year ago

You posted a result dict with predicted boxes_3d, scores_3d and labels_3d several messages before in this issue. This is the information, you are asking.

SamsungLabs / tr3d

How do I run this network with my own data? #16

build the model from a config file and a checkpoint file

show the results