NVIDIA-AI-IOT / CUDA-PointPillars

A project demonstrating how to use CUDA-PointPillars to deal with cloud points data from lidar.
Apache License 2.0
502 stars 148 forks source link

Exporter Custom Models Fix #77

Open rjwb1 opened 1 year ago

rjwb1 commented 1 year ago

Correctly applies params from the model cfg to the onnx exporter

GuillaumeAnoufa commented 1 year ago

You reversed VOXEL_SIZE_X and VOXEL_SIZE_Y in the definition of simplify_preprocess

Defined in _simplifieronnx.py as: def simplify_preprocess(onnx_model, VOXEL_SIZE_Y, VOXEL_SIZE_X, MAX_POINTS_PER_VOXEL): Called in exporter.py with: onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL)

rjwb1 commented 1 year ago

@GuillaumeAnoufa I will update this, I was rushing and didn't realise as my cloud is square

rjwb1 commented 1 year ago

@GuillaumeAnoufa I have fixed this. I reversed it twice so it actually should not of affected the final model but better to have correctly named variables/args...

Allamrahul commented 1 year ago

Hi, I have used this commit to successfully export my model to onnx format. However, when I perform predictions using TensorRT , I am seeing different results when compared to when I just do eval on the trained pth file. More about my issue can be found in https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/issues/82. Please let me know if I am missing something.

rjwb1 commented 1 year ago

Hi @Allamrahul have you verified the pointcloud information is being loaded correctly and in the right order?

Allamrahul commented 1 year ago

Could you further elaborate if possible? What do you mean by right order? I was able to use my custom data, train the model to detect a single object, validate the results using the demo.py file: The boxes look right on the eval set and the results look really good. Post that, I tried to export but I realized that everything in the export script was hardcoded for 3 classes. I then referred your PR, made those changes, and thankfully, they unblocked me and allowed me to export the model. I later moved the generated params.h to include folder and .onnx file to model folder and followed the instrcutions in https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars, under compile and run.

If the point cloud information was not being loaded correctly, I think my results on eval set would have been terrible. I compared the exporter.py, the file responsible for exporting to onnx and demo.py script, the one which performs the eval and helps me visualize predictions on my eval set: both process the data in the same manner.

I am using the following command for export: image

I have also changed line 157 in main.py to let me predict on .npy files instead of .bin file

If you need further information to guide in the right direction, please let me know.

Allamrahul commented 1 year ago

exporter.py file


import glob
import onnx
import torch
import argparse
import numpy as np

from pathlib import Path
from onnxsim import simplify
from pcdet.utils import common_utils
from pcdet.models import build_network
from pcdet.datasets import DatasetTemplate
from pcdet.config import cfg, cfg_from_yaml_file

from exporter_paramters import export_paramters as export_paramters
from simplifier_onnx import simplify_preprocess, simplify_postprocess

class DemoDataset(DatasetTemplate):
    def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None, ext='.bin'):
        """
        Args:
            root_path:
            dataset_cfg:
            class_names:
            training:
            logger:
        """
        super().__init__(
            dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
        )
        self.root_path = root_path
        self.ext = ext
        data_file_list = glob.glob(str(root_path / f'*{self.ext}')) if self.root_path.is_dir() else [self.root_path]

        data_file_list.sort()
        self.sample_file_list = data_file_list

    def __len__(self):
        return len(self.sample_file_list)

    def __getitem__(self, index):
        if self.ext == '.bin':
            points = np.fromfile(self.sample_file_list[index], dtype=np.float32).reshape(-1, 4)
        elif self.ext == '.npy':
            points = np.load(self.sample_file_list[index])
        else:
            raise NotImplementedError

        input_dict = {
            'points': points,
            'frame_id': index,
        }

        data_dict = self.prepare_data(data_dict=input_dict)
        return data_dict

def parse_config():
    parser = argparse.ArgumentParser(description='arg parser')
    parser.add_argument('--cfg_file', type=str, default='cfgs/kitti_models/pointpillar.yaml',
                        help='specify the config for demo')
    parser.add_argument('--data_path', type=str, default='demo_data',
                        help='specify the point cloud data file or directory')
    parser.add_argument('--ckpt', type=str, default=None, help='specify the pretrained model')
    parser.add_argument('--ext', type=str, default='.bin', help='specify the extension of your point cloud data file')

    args = parser.parse_args()

    cfg_from_yaml_file(args.cfg_file, cfg)

    return args, cfg

def main():
    args, cfg = parse_config()
    export_paramters(cfg)
    logger = common_utils.create_logger()
    logger.info('------ Convert OpenPCDet model for TensorRT ------')
    demo_dataset = DemoDataset(
        dataset_cfg=cfg.DATA_CONFIG, class_names=cfg.CLASS_NAMES, training=False,
        root_path=Path(args.data_path), ext=args.ext, logger=logger
    )

    model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES), dataset=demo_dataset)
    model.load_params_from_file(filename=args.ckpt, logger=logger, to_cpu=True)
    model.cuda()
    model.eval()
    np.set_printoptions(threshold=np.inf)
    with torch.no_grad():

      MAX_VOXELS = 10000
      NUMBER_OF_CLASSES = len(cfg.CLASS_NAMES)
      MAX_POINTS_PER_VOXEL = None

      DATA_PROCESSOR = cfg.DATA_CONFIG.DATA_PROCESSOR
      POINT_CLOUD_RANGE = cfg.DATA_CONFIG.POINT_CLOUD_RANGE
      for i in DATA_PROCESSOR:
          if i['NAME'] == "transform_points_to_voxels":
              MAX_POINTS_PER_VOXEL = i['MAX_POINTS_PER_VOXEL']
              VOXEL_SIZES = i['VOXEL_SIZE']
              break

      if MAX_POINTS_PER_VOXEL == None:
          logger.info('Could Not Parse Config... Exiting')
          import sys
          sys.exit()

      VOXEL_SIZE_X = abs(POINT_CLOUD_RANGE[0] - POINT_CLOUD_RANGE[3]) / VOXEL_SIZES[0]
      VOXEL_SIZE_Y = abs(POINT_CLOUD_RANGE[1] - POINT_CLOUD_RANGE[4]) / VOXEL_SIZES[1]

      FEATURE_SIZE_X = VOXEL_SIZE_X / 2  # Is this number of bins?
      FEATURE_SIZE_Y = VOXEL_SIZE_Y / 2

      dummy_voxels = torch.zeros(
          (MAX_VOXELS, MAX_POINTS_PER_VOXEL, 4),
          dtype=torch.float32,
          device='cuda:0')

      dummy_voxel_idxs = torch.zeros(
          (MAX_VOXELS, 4),
          dtype=torch.int32,
          device='cuda:0')

      dummy_voxel_num = torch.zeros(
          (1),
          dtype=torch.int32,
          device='cuda:0')

      dummy_input = dict()
      dummy_input['voxels'] = dummy_voxels
      dummy_input['voxel_num_points'] = dummy_voxel_num
      dummy_input['voxel_coords'] = dummy_voxel_idxs
      dummy_input['batch_size'] = torch.tensor(1)

      torch.onnx.export(model,       # model being run
          dummy_input,               # model input (or a tuple for multiple inputs)
          "./pointpillar_raw.onnx",  # where to save the model (can be a file or file-like object)
          export_params=True,        # store the trained parameter weights inside the model file
          opset_version=11,          # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          keep_initializers_as_inputs=True,
          input_names = ['voxels', 'voxel_num', 'voxel_idxs'],   # the model's input names
          output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
          )

      onnx_raw = onnx.load("./pointpillar_raw.onnx")  # load onnx model
      onnx_trim_post = simplify_postprocess(onnx_raw, FEATURE_SIZE_X, FEATURE_SIZE_Y, NUMBER_OF_CLASSES)

      onnx_simp, check = simplify(onnx_trim_post)
      assert check, "Simplified ONNX model could not be validated"

      onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL)
      onnx.save(onnx_final, "pointpillar.onnx")
      print('finished exporting onnx')

    logger.info('[PASS] ONNX EXPORTED.')

if __name__ == '__main__':
    main()
Allamrahul commented 1 year ago

simplifier_onnx.py

import onnx
import numpy as np
import onnx_graphsurgeon as gs

@gs.Graph.register()
def replace_with_clip(self, inputs, outputs,  voxel_array):
    for inp in inputs:
        inp.outputs.clear()

    for out in outputs:
        out.inputs.clear()

    op_attrs = dict()
    op_attrs["dense_shape"] =  voxel_array

    return self.layer(name="PPScatter_0", op="PPScatterPlugin", inputs=inputs, outputs=outputs, attrs=op_attrs)

def loop_node(graph, current_node, loop_time=0):
  for i in range(loop_time):
    next_node = [node for node in graph.nodes if len(node.inputs) != 0 and len(current_node.outputs) != 0 and node.inputs[0] == current_node.outputs[0]][0]
    current_node = next_node
  return next_node

def simplify_postprocess(onnx_model, FEATURE_SIZE_X, FEATURE_SIZE_Y, NUMBER_OF_CLASSES):
  print("Use onnx_graphsurgeon to adjust postprocessing part in the onnx...")
  graph = gs.import_onnx(onnx_model)

  cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 2 * NUMBER_OF_CLASSES * NUMBER_OF_CLASSES))
  box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 14 * NUMBER_OF_CLASSES))
  dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 4 * NUMBER_OF_CLASSES))

  tmap = graph.tensors()
  new_inputs = [tmap["voxels"], tmap["voxel_idxs"], tmap["voxel_num"]]
  new_outputs = [cls_preds, box_preds, dir_cls_preds]

  for inp in graph.inputs:
    if inp not in new_inputs:
      inp.outputs.clear()

  for out in graph.outputs:
    out.inputs.clear()

  first_ConvTranspose_node = [node for node in graph.nodes if node.op == "ConvTranspose"][0]
  concat_node = loop_node(graph, first_ConvTranspose_node, 3)
  assert concat_node.op == "Concat"

  first_node_after_concat = [node for node in graph.nodes if len(node.inputs) != 0 and len(concat_node.outputs) != 0 and node.inputs[0] == concat_node.outputs[0]]

  for i in range(3):
    transpose_node = loop_node(graph, first_node_after_concat[i], 1)
    assert transpose_node.op == "Transpose"
    transpose_node.outputs = [new_outputs[i]]

  graph.inputs = new_inputs
  graph.outputs = new_outputs
  graph.cleanup().toposort()

  return gs.export_onnx(graph)

def simplify_preprocess(onnx_model, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL):
  print("Use onnx_graphsurgeon to modify onnx...")
  graph = gs.import_onnx(onnx_model)

  tmap = graph.tensors()
  MAX_VOXELS = tmap["voxels"].shape[0]

  VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X), int(VOXEL_SIZE_Y)])

  input_new = gs.Variable(name="voxels", dtype=np.float32, shape=(MAX_VOXELS, MAX_POINTS_PER_VOXEL, 10))

  X = gs.Variable(name="voxel_idxs", dtype=np.int32, shape=(MAX_VOXELS, 4))

  Y = gs.Variable(name="voxel_num", dtype=np.int32, shape=(1,))

  first_node_after_pillarscatter = [node for node in graph.nodes if node.op == "Conv"][0]

  first_node_pillarvfe = [node for node in graph.nodes if node.op == "MatMul"][0]

  next_node = current_node = first_node_pillarvfe
  for i in range(6):
    next_node = [node for node in graph.nodes if node.inputs[0] == current_node.outputs[0]][0]
    if i == 5:              # ReduceMax
      current_node.attrs['keepdims'] = [0]
      break
    current_node = next_node

  last_node_pillarvfe = current_node

  graph.inputs.append(Y)
  inputs = [last_node_pillarvfe.outputs[0], X, Y]
  outputs = [first_node_after_pillarscatter.inputs[0]]
  graph.replace_with_clip(inputs, outputs,  VOXEL_ARRAY)

  graph.cleanup().toposort()

  graph.inputs = [first_node_pillarvfe.inputs[0] , X, Y]
  graph.outputs = [tmap["cls_preds"], tmap["box_preds"], tmap["dir_cls_preds"]]

  graph.cleanup()

  graph.inputs = [input_new, X, Y]
  first_add = [node for node in graph.nodes if node.op == "MatMul"][0]
  first_add.inputs[0] = input_new

  graph.cleanup().toposort()

  return gs.export_onnx(graph)

if __name__ == '__main__':
    mode_file = "pointpillar-native-sim.onnx"
    simplify_preprocess(onnx.load(mode_file))
Allamrahul commented 1 year ago

Hi @Allamrahul have you verified the pointcloud information is being loaded correctly and in the right order?

By this, do you mean how main.py is loading the .npy file? The script is meant for .bin files but it should work for .npy files as well. Please let me know if I am missing something.

Allamrahul commented 1 year ago

@GuillaumeAnoufa I have fixed this. I reversed it twice so it actually should not of affected the final model but better to have correctly named variables/args...

Hi, I used this commit but when I compared my results using the pth file Vs TRT inference, my predictions matched in box sizes, z dimension and confidence but not in X and Y coordinates. I tweaked the code the following way: In exporter.py, I kept the following line unchanged: onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL) But in simplifier_onnx.py, I swapped the order: def simplify_preprocess(onnx_model, VOXEL_SIZE_Y, VOXEL_SIZE_X, MAX_POINTS_PER_VOXEL) and made VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X), int(VOXEL_SIZE_Y)]).

This is atleast allowing me to get the same results across both eval using pth and using the onnx file for TRT inference. Not sure why this is working. That being said, I am getting slightly lesser number of predictions when I make predictions using TF-RT. Not sure why this is. Would really like some help in to understand if what I am doing is right.

rjwb1 commented 1 year ago

Hi, this is the same as my original commit before the suggestion was made by @GuillaumeAnoufa to change it. I guess I was right all along as I inspected the model in netron. I'll revert the commit suggested by @GuillaumeAnoufa

Allamrahul commented 1 year ago

Hi, I tried your 1st commit but that's not working: The following is the analysis:

In your 1st iteration:

Call: X, Y Fn def: Y, X VOXEL_ARRAY: Y, X

conclusion: Call's X maps to VOXEL_ARRAY[0] Call's Y maps to VOXEL_ARRAY[1]

2nd iteration: (according to commit suggested by @GuillaumeAnoufa):

call: X, Y Fn def: X, Y, VOXEL_ARRAY: X, Y

conclusion: Call's X maps to VOXEL_ARRAY[0] Call's Y maps to VOXEL_ARRAY[1]

what works for me: call: X, Y Fn def: Y, X VOXEL_ARRAY: X, Y

conclusion: Call's X maps to VOXEL_ARRAY[1] Call's Y maps to VOXEL_ARRAY[0]


I have just retried iteration 1 and 2 again and they don't solve the issue because, inherently they both are doing the same thing. Mapping gets reversed if I try the way I suggested. Could you confirm this?

rjwb1 commented 1 year ago

That seems right, in my implementation by voxel shape is (600,600) so I would not notice this issue. I will fix this as soon as I can

Allamrahul commented 1 year ago

One more question: the boxes I get during TFRT inference are just a subset of the boxes I get during evaluation phase using the pth file. For example, for a .npy file, during eval phase, if I get 4 bounding boxes, I am getting 1 or 2 or 3 during TFRT inference and the output number changes every time I run it. Any way to get all the detections during TFRT inference?

rjwb1 commented 1 year ago

@Allamrahul are you using the same score and NMS threshold? I guess I would start by adjusting these in Params.h. I haven't directly compared my PyTorch results to the ones from tensorrt but they seem the same for me.

rjwb1 commented 1 year ago

I just removed that entirely as I require very fast performance. I also implemented a better way of loading params from a yaml file exporter.py generates if you'd be interested.

For guidance in my Params.h I find that a score threshold of 0.3-0.4 and an NMS thresh of 0.01 works well.

Allamrahul commented 1 year ago

Will check that. Additionally, when I enable fp16, I am getting 100's of bounding boxes ( in the range of 5 to 350) during TFRT inference. When I disable fp 16, recompile and run, the number of detections are back to normal.

Let me know the right way of doing it and if I am missing something here.

rjwb1 commented 1 year ago

This worked for me. Obviously FP16 can incur a accuracy penalty

Allamrahul commented 1 year ago

Could you specify what worked for you? Its not clear from your comment. Thanks. Also, currently, my score threshold is 0.25 and NMS thresh of 0.01 in params.h. I am just using the params.h the exporter.py generates during onnx model generation

rjwb1 commented 1 year ago

I mean FP16 worked normally for me when commenting the lines you suggested above

rjwb1 commented 1 year ago

Perhaps try a score_thesh of 0.4

Allamrahul commented 1 year ago

By normally, you mean you too are getting hundreds of detections? Sorry, I dont have much experience in deployment and this is the first time I am dealing with fp16.

rjwb1 commented 1 year ago

No worries, I meant that I did not observes having hundreds of detections with FP16 but my confidence is set to 0.3. Perhaps look at the detections you are getting and if you are receiving lots of low scores increase the threshold.

Allamrahul commented 1 year ago

Got it, let me check that.

Allamrahul commented 1 year ago

Also, one more thing: I am using .npy files since I am using a custom dataset. I observed that there is an 32 byte offset when I load the same npy file via python, numpy VS when I load it through cpp. Could this be a factor?

rjwb1 commented 1 year ago

I am using ROS so I do not have to load any files so can't fully recommend a solution. However I do write the binary files I use for training with OpenPCDet. The only think I could recommend is trying to making the dtype of the numpy array you are using np.float16. Although I seem to get good results in my implementation without explicitly using float16 when I convert from the ROS msg.

Allamrahul commented 1 year ago

@rjwb1 , could you point me to the exact TFRT inference files you are using at the moment? As mentioned before, my fp16 numbers are out of whack, 300 detections in some case and 5 in other. I am expecting it to give a number between 3 and 5 for every point cloud. I would like to cross reference the exact commit or group of commits you are using for inference just to make sure I am not missing anything of importance. After analyzing the results, I found out that the model is too confident on some examples, giving out a confidence values of 90 to 100 % in a lot of the detection. But on some examples, its giving the right output.

Allamrahul commented 1 year ago

@rjwb1 , in regards to https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/issues/85, I see that MAX_VOXELS is hard coded to 10000 in the export script exporter.py. But when I examine the pointpillar.yaml, I see this: MAX_NUMBER_OF_VOXELS: { 'train': 16000, 'test': 40000 } So, should'nt MAX_VOXELS in the export script be 40000?

I tried this out: when I gave 40000 for MAX_VOXELS and exported the onnx file, the multiple false positives I get in FP16 inference TFRT goes away. Can anyone confirm what I did makes sense?

big773 commented 1 year ago

Correctly applies params from the model cfg to the onnx exporter

Hi,I can export My custom onnx model, but the results seem innocent,can you give me some help

HSqure commented 1 year ago

Hello, thank you for your work on the custom model conversion. I found that the dense shape of PPScatter_0 in the model converted with your code is reversed compared to the official ONNX model.

But, currently, everything is working fine after making a modification to swap the positions of VOXEL_SIZE_Y and VOXEL_SIZE_X in the below section:

In simplifier_onnx.py,line 83 as: before modification:

VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X),int(VOXEL_SIZE_Y)])

after modification:

VOXEL_ARRAY = np.array([int(VOXEL_SIZE_Y),int(VOXEL_SIZE_X)])

Hope this helps to solve the problem!

zzt007 commented 1 year ago

@Allamrahul hello , have u solved the trt mismatch problem? I also meet this problem , could u tell me how to solve it? thanks for your guidance.

Acuno41 commented 10 months ago

Hello everyone and thanks @rjwb1 for the amazing updates,

I can successfully trained my custom data with 3 classes like KITTI (vehicle, pedestrian, cyclist) in OpenPCDet and results looked fine on python side. Then i converted the model with exporter.py and also ran it succesfully in my c++ code.

Then i trained same custom data with 12 classes (i seperated the vehicle class like bus, van ,truck) and results also looked fine on python side but after the exporting the model with exporter.py the results on c++ side was completely random and produced lots of large false detections.

Has anyone encountered a problem like this before? or trained with different class sizes before ? Could there be something class dependent parameter in exporter.py?

I would be glad if anyone can help. Thank you.

Acuno41 commented 10 months ago

I found out that the "MAX_POINTS_PER_VOXEL" parameter in the pointpillar.yaml file is the problem. When I change the parameter from the default 32 to something different, it causes the problem I described above. I am looking for solution.

zzt007 commented 10 months ago

hi,great job. So u mean the reason is MAX_POINTS_PER_VOXEL changed?if u set it as 32,the c++ size results will be same with your python side results? it bothers me a lot

---Original--- From: @.> Date: Wed, Aug 9, 2023 21:18 PM To: @.>; Cc: "Zhentao @.**@.>; Subject: Re: [NVIDIA-AI-IOT/CUDA-PointPillars] Exporter Custom Models Fix (PR #77)

I found out that the "MAX_POINTSPER" parameter in the pointpillar.yaml file is the problem. When I change the parameter from the default 32 to something different, it causes the problem I described above. I am looking for solution.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Acuno41 commented 10 months ago

hi @zzt007,

The pointcloud in my dataset is very dense in close range so i set the MAX_POINTS_PER_VOXEL parameter to 128. But after i trained my data with that parameter and export with this functions, the boundingbox results were completely random in c++ side. Then i started training with default MAX_POINTS_PER_VOXEL:32 parameter and tested after couple of epochs the model started to detect the objects in correct boundingbox sizes.

I'am still in early stages in training and optimizing parameters, but as soon as i get proper results i will compare the results.

rjwb1 commented 10 months ago

Hi guys, I had to make some small changes as I work in a different private repository so I haven't fully tested everything. For my application I use a single class however I have tried with multiple. And I also use a custom voxel size and number (XYZ) and this works for me. I'm not at my computer right now but when I return I'd be happy to help 👍🏼

rjwb1 commented 10 months ago

Just to confirm you're correctly copying the Params.h header over? In my version I generate a config file that does not need to be rebuilt but I haven't done this here

rjwb1 commented 10 months ago

@Acuno41 I have discovered that the MAX_POINTS_PER_VOXEL is also hardcoded in the kernel.h. Did you change it here?

https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/blob/092affc36c72d7b8f7530685d4c0f538d987a94b/include/kernel.h#L28-L33

zzt007 commented 10 months ago

It sounds not bad , sincerely looking forward to your results and reply. thanks

------------------ 原始邮件 ------------------ 发件人: "NVIDIA-AI-IOT/CUDA-PointPillars" @.>; 发送时间: 2023年8月9日(星期三) 晚上10:42 @.>; @.**@.>; 主题: Re: [NVIDIA-AI-IOT/CUDA-PointPillars] Exporter Custom Models Fix (PR #77)

hi @zzt007,

The pointcloud in my dataset is very dense in close range so i set the MAX_POINTS_PER_VOXEL parameter to 128. But after i trained my data with that parameter and export with this functions, the boundingbox results were completely random in c++ side. Then i started training with default MAX_POINTS_PER_VOXEL:32 parameter and tested after couple of epochs the model started to detect the objects in correct boundingbox sizes.

I'am still in early stages in training and optimizing parameters, but as soon as i get proper results i will compare the results.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Acuno41 commented 10 months ago

Hi @rjwb1, thanks for the response,

Just to confirm you're correctly copying the Params.h header over? In my version I generate a config file that does not need to be rebuilt but I haven't done this here

Yes, i correctly copied the params.h to the c++ side and checked if they loaded correctly to the c++ code

@Acuno41 I have discovered that the MAX_POINTS_PER_VOXEL is also hard-coded in the kernel.h. Did you change it here?

https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/blob/092affc36c72d7b8f7530685d4c0f538d987a94b/include/kernel.h#L28-L33

And also i updated the kernel.h little bit to remove the hardcoded param.h dependent parameters, kernel.h looks like below in my code

const int THREADS_FOR_VOXEL = 256;    // threads number for a block
const int POINTS_PER_VOXEL = Params::max_num_points_per_pillar;      // depands on "params.h"
const int WARP_SIZE = 32;             // one warp(32 threads) for one pillar
const int WARPS_PER_BLOCK = 4;        // four warp for one block
const int FEATURES_SIZE = 10;         // features maps number depands on "params.h"
const int PILLARS_PER_BLOCK = 64;     // one thread deals with one pillar and a block has PILLARS_PER_BLOCK threads
const int PILLAR_FEATURE_SIZE = Params::num_feature_scatter;   // feature count for one pillar depands on "params.h"

And i changed max_num_points_per_pillar and num_feature_scatter to static const in params.h.

Considering that the MAX_POINTS_PER_VOXEL parameter is used in the preprocess part, I suspect something there might be causing the problem while preparing the data to feed to model.

soo4826 commented 10 months ago

Hi @rjwb1

I also follow your forked repository and this PR #77 , But it does not shows same result compared to my pytorch(*.pth) inference

Here's my overall procedure!

1. Train my custom model with custom dataset

2. Convert my custom model *.pth into *.onnx with exporter.py

3. Change include/param.h

5. Modify Hard-coded value in kernel.h

6. build and infer

Here's result of pytorch+ros inference image

Also, here's result of CUDA-PointPillars image Can you give me some advice?

Also, have you wrapped this package into ROS?

zzt007 commented 10 months ago

I also had the same problem at the  cuda version, and I need to work with ROS too , if u have any ways or ideas to solve it  , please contact with me . thanks a lot.

---Original--- From: @.> Date: Wed, Aug 30, 2023 00:20 AM To: @.>; Cc: "Zhentao @.**@.>; Subject: Re: [NVIDIA-AI-IOT/CUDA-PointPillars] Exporter Custom Models Fix (PR #77)

Hi @rjwb1

I also follow your forked repository and this PR #77 , But it does not shows same result compared to my pytorch(*.pth) inference

Here's my overall procedure!

  1. Train my custom model with custom dataset

INPUT_RANGE: [-80, -80, -10, 80, 80, 10] (Square!)

VOXEL_SIZE: [0.4, 0.4, 20]

  1. Convert my custom model .pth into .onnx with exporter.py

  2. Change include/param.h

Apply newly created file param.h in step 2

  1. Modify Hard-coded value in kernel.h

POINTS_PER_VOXEL

  1. build and infer

visualize with open3d

Here's result of pytorch+ros inference

Also, here's result of CUDA-PointPillars

Can you give me some advice?

Also, have you wrapped this package into ROS?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>