Open GuillaumeAnoufa opened 1 year ago
Have you found a solution to this? I'm having a similar issue using my own model with one class, except it just gets stuck on inference (after the find pillar_num line). I've also noticed one of the cores on the Xavier is being maxed out while this is happening.
Have you found a solution to this? I'm having a similar issue using my own model with one class, except it just gets stuck on inference (after the find pillar_num line). I've also noticed one of the cores on the Xavier is being maxed out while this is happening.
Unfortunately I have no solution yet :(. If you find any lead, please tell me about it ! The problem happens both on my PC (Nvidia 2080) and my Nx Xavier.
Hello, Can someone help on this matter please ?
@GuillaumeAnoufa I am experiencing the same issue. I suspect that it is related to this line as changing the values will still seemingly build the model correct without errors.
I am also using a single class detector but I am also using a different pointcloud range and voxel size. I am going to train the model with 3 classes to verify if this is an issue with the number of class etc or the pointcloud range
@byte-deve Hi do you know what are each of these numbers are a product of? '496' and '432'
Hi, i realised this are the size of the feature grid
@GuillaumeAnoufa I now have this working with a fully custom model, if you still need support you can @ me :)
@rjwb1 hey, I'm having the same issues with setting up a custom model, would really appreciate some guidance :) This is my model and dataset config for reference:
################## MODEL CONFIG ##################### DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/mydata_dataset_only_cone.yaml POINT_CLOUD_RANGE: [0, -30.72, -3, 40.96, 30.72, 1] DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels VOXEL_SIZE: [0.16, 0.16, 4] MAX_POINTS_PER_VOXEL: 100 MAX_NUMBER_OF_VOXELS: { 'train': 20000, 'test': 60000 #16000 } DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder','gt_sampling'] AUG_CONFIG_LIST:
NAME: gt_sampling USE_ROAD_PLANE: False DB_INFO_PATH:
"data" PREPARE: { filter_by_min_points: ['Cone:7'], filter_by_difficulty: [-1], }
SAMPLE_GROUPS: ['Cone:200'] NUM_POINT_FEATURES: 5 DATABASE_WITH_FAKELIDAR: False REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0] LIMIT_WHOLE_SCENE: False
NAME: random_world_flip ALONG_AXIS_LIST: ['x']
NAME: random_world_rotation WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
NAME: random_world_scaling WORLD_SCALE_RANGE: [0.95, 1.05]
NAME: random_world_frustum_dropout INTENSITY_RANGE: [ 0, 0.2 ] DIRECTION: [ 'top' ]
NAME: random_local_frustum_dropout INTENSITY_RANGE: [ 0, 0.2 ] DIRECTION: [ 'top' ]
MODEL: NAME: PointPillar
VFE:
NAME: PillarVFE
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [64]
MAP_TO_BEV:
NAME: PointPillarScatter
NUM_BEV_FEATURES: 64
BACKBONE_2D:
NAME: BaseBEVBackbone
LAYER_NUMS: [3, 5, 5]
LAYER_STRIDES: [2, 2, 2]
NUM_FILTERS: [64, 128, 256]
UPSAMPLE_STRIDES: [1, 2, 4]
NUM_UPSAMPLE_FILTERS: [128, 128, 128]
DENSE_HEAD:
NAME: AnchorHeadSingle
CLASS_AGNOSTIC: False
USE_DIRECTION_CLASSIFIER: True
DIR_OFFSET: 0.78539
DIR_LIMIT_OFFSET: 0.0
NUM_DIR_BINS: 2
ANCHOR_GENERATOR_CONFIG: [
{
'class_name': 'Cone',
'anchor_sizes': [ [ 0.3, 0.3, 0.6 ] ],
'anchor_rotations': [ 0, 1.57 ],
'anchor_bottom_heights': [ -0.7 ],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.4
}
]
TARGET_ASSIGNER_CONFIG:
NAME: AxisAlignedTargetAssigner
POS_FRACTION: -1.0
SAMPLE_SIZE: 512
NORM_BY_NUM_EXAMPLES: False
MATCH_HEIGHT: False
BOX_CODER: ResidualCoder
LOSS_CONFIG:
LOSS_WEIGHTS: {
'cls_weight': 1.0,
'loc_weight': 2.0,
'dir_weight': 0.2,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
SCORE_THRESH: 0.3
OUTPUT_RAW_SCORE: False
EVAL_METRIC: kitti
NMS_CONFIG:
MULTI_CLASSES_NMS: False
NMS_TYPE: nms_gpu
NMS_THRESH: 0.01
NMS_PRE_MAXSIZE: 300
NMS_POST_MAXSIZE: 100
OPTIMIZATION: BATCH_SIZE_PER_GPU: 3 NUM_EPOCHS: 80
OPTIMIZER: adam_onecycle
LR: 0.003
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9
MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001
LR_WARMUP: False
WARMUP_EPOCH: 1
########################## DATASET CONFIG ######################## FILTER_MIN_POINTS_IN_GT: 1 POINT_CLOUD_RANGE: [0, -30.72, -3, 40.96, 30.72, 1] # xmin, ymin, zmin, xmax, ymax, zmax
DATA_SPLIT: { 'train': train, 'test': val }
INFO_PATH: { 'train': [mydata_infos_train.pkl], 'test': [mydata_infos_val.pkl], }
TRAINING_CATEGORIES: { 'Cone': 'Cone', }
FOV_POINTS_ONLY: False
DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder','gt_sampling'] AUG_CONFIG_LIST:
NAME: gt_sampling USE_ROAD_PLANE: False DB_INFO_PATH:
mydata_dbinfos_train.pkl PREPARE: { filter_by_min_points: ['Cone:20'], filter_by_difficulty: [-1], }
SAMPLE_GROUPS: ['Cone:200'] NUM_POINT_FEATURES: 5 DATABASE_WITH_FAKELIDAR: False REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0] LIMIT_WHOLE_SCENE: True
NAME: random_world_flip ALONG_AXIS_LIST: ['x', 'y']
NAME: random_world_rotation WORLD_ROT_ANGLE: [-3.14159265, 3.114159265]
NAME: random_world_scaling WORLD_SCALE_RANGE: [0.95, 1.05]
POINT_FEATURE_ENCODING: { encoding_type: absolute_coordinates_encoding, used_feature_list: ['x', 'y', 'z', 'intensity'], src_feature_list: ['x', 'y', 'z', 'intensity', 'timestamp'], }
DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels VOXEL_SIZE: [0.16, 0.16, 4]
MAX_POINTS_PER_VOXEL: 5 MAX_NUMBER_OF_VOXELS: { 'train': 16000, 'test': 40000 }
GRAD_NORM_CLIP: 10
@mazm0002 hi there, does the model train successfully and work in PyTorch? What stage of the process are you having trouble with?
@rjwb1 Yea so I can train successfully and get the required outputs I expect. Then I use the onnx exporter tool to convert the model to onnx and run it with the demo feeding it custom test data (that works fine in PyTorch). TensorRT engine generates fine, but then when it actually did detections, they take a long time to process and there are way too many bounding boxes and most of them incorrect. Think the issue is probably in the onnx conversion, was wondering if you could let me know what you had to change in the tool to get it working for 1 class and custom data/model config. Thanks a lot for the help!
@mazm0002 Hi, I too experienced this and it was due to some hard coded parameters inside the exporter. I also used this useful tool to inspect my generated onnx file to ensure it was similar to the default one:
https://github.com/lutzroeder/netron
Can you show me what values you have here or is it default?
I think maybe with your model it should look like this?
op_attrs["dense_shape"] = np.array([384,256])
return self.layer(name="PPScatter_0", op="PPScatterPlugin", inputs=inputs, outputs=outputs, attrs=op_attrs)
def loop_node(graph, current_node, loop_time=0):
for i in range(loop_time):
next_node = [node for node in graph.nodes if len(node.inputs) != 0 and len(current_node.outputs) != 0 and node.inputs[0] == current_node.outputs[0]][0]
current_node = next_node
return next_node
def simplify_postprocess(onnx_model):
print("Use onnx_graphsurgeon to adjust postprocessing part in the onnx...")
graph = gs.import_onnx(onnx_model)
cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, 192, 128, 2))
box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, 192, 128, 18))
dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, 192, 128, 4))
The size of the scatter plugin array should be equal to the dimensions of the voxel grid
I will open a PR to parameterise these values properly :)
@mazm0002 can you try exporting with the changes I have made in #77
As you are using additional pointcloud attributes (5 instead of 4) this may require further parameters
Hello @rjwb1 thanks for your inputs ! I added your changes but it didn't seem to solve my problem unfortunately. The shape of my model did not change using your PR because I already used the default grid size.
My config only has a few changes from the default config: POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1] -> POINT_CLOUD_RANGE: [0, -39.68, -1, 69.12, 39.68, 7] VOXEL_SIZE: [0.16, 0.16, 4] -> VOXEL_SIZE: [0.16, 0.16, 8] The biggest change is the fact that I am using a single class instead of 3.
My exported model shape seems accurate but I am still experiencing these very long post-processing.
Below, a picture of the output shape of the exported model: (rest of the model is exactly the same as the example one)
Hmmm, I am also using a single class... would you mind sending a copy of your cfg file and I will see if I can reproduce this
Sure: pointpillar2.txt I changed the _BASECONFIG to the default one. I don't think the _BASECONFIG matters here since everything is redefined in the actual config.
@GuillaumeAnoufa looks almost identical to mine. Strange... I guess I also have my score thresh set to 0.4 and my nms thresh to 0.1 in my Params.h. This could reduce post processing latency?
@rjwb1 It doesn't seem to change anything.
I tried exporting the default "pointpillar_7728.pth" model with the default config and just reducing the number of classes from 3 to 1 and experience the same issue on the default data. Changing the number of class from 3 to 1 seem to be causing the bug on my code.
load file: ../data/data_velo/000001.bin find points num: 18630 find pillar_num: 6815 TIME: generateVoxels: 0.03072 ms. TIME: generateFeatures: 0.045824 ms. TIME: doinfer: 15.7839 ms. TIME: doPostprocessCuda: 64484.9 ms. TIME: pointpillar: 64500.8 ms. Bndbox objs: 4646 Saved prediction in: ../eval/kitti/object/pred_velo/000001.txt
Changing the number of classes in the config file results in a abnomarly high number predicted bounding boxes objects
@rjwb1 If you try exporting the default model with this config file(which is the default one but with a single class): pointpillar_1class.txt and infer on the default velodyne data do you experience slow post processing ? I know this exported model should not work anyway since the model has been trained for 3 classes but I would like to know if it is reproducible. Thanks a lot for your help :)
I forgot to copy the generated param.h and recompile after changing the model... Post processing time is back to normal, sorry for the inconvenience :sob:
@GuillaumeAnoufa no worries, glad you found the solution 👍🏼
@mazm0002 can you try exporting with the changes I have made in #77
Hi, thanks for your work, I change the files according your pr#77, and I moved parms.h and also recompiled. But the inference is still very slow in 'doPostprocessCuda' . my model have 4 classes and tested in OpenPCDet correctly, could you give me some ideas?I would appreciate it very much!
Could you tell me how you solved your question? I also meet this problem ,I found it generate more than 1 millon boxes before nms ,so the postprocess is very slow. I change my code following @rjwb1 ,but not works.
I can export my custom model to onnx, but the result seems incorrect,can you give me some advice
@rjwb1 hello , thanks for your guidance very much. I changed the paramters just like u did , but the problem went from slow post-processing to cuda error: illegal memory access . I also try to use my own model which detect only one class,and also add the ROS, So I sincerely hope u can tell me how to solve the problem ,it brothers me a few days.
System: Ubuntu 20.04 Last version of OpenPcDet GPU has cuda devices: 1 ----device id: 0 info---- GPU : NVIDIA GeForce RTX 2080 with Max-Q Design Capbility: 7.5 Global memory: 7982MB Const memory: 64KB SM in a block: 48KB warp size: 32 threads in a block: 1024 block dim: (1024,1024,64) grid dim: (2147483647,65535,65535)
Hello,
I exported my pointpillar weights trained on custom data. The only change compared to the example model in parameters is the fact that it only uses 1 class instead of 3. I had to change a few things in tools/simplifier_onnx.py for the exporter to work with other than 3 classes:
Code changes to work with 1 class I changed the signature of
simplify_postprocess(onnx_model)
tosimplify_postprocess(onnx_model, num_classes)
and changed 3 other lines.The exporter works but when testing the demo with this model: ---- RUN TIME ---- load file: ../data/data_velo/000001.bin find points num: 18630 find pillar_num: 6815 TIME: generateVoxels: 0.038048 ms. TIME: generateFeatures: 0.053024 ms. TIME: doinfer: 30.2525 ms. TIME: doPostprocessCuda: 57528.1 ms. TIME: pointpillar: 57558.6 ms. Bndbox objs: 4158 Saved prediction in: ../eval/kitti/object/pred_velo/000001.txt
This model works perfectly fine in pytorch.
As you can see the post process part takes a long time and outputs thousands of bounding boxes. Issue #43 references a similar problem seemingly solved by an update but I am currently using the most updated version of this repo.
Do you have an idea what could cause this issue?
I can upload my .pth file or my onnx file if you want to try and reproduce this.
Best regards,