TensorRT deployment question

dinvincible98 commented 1 year ago

Hi @Haiyang-W ,

Thanks for sharing me the unrefined trt deployment script!. I have a question regarding below lines:

batch_dict = torch.load("input data file(after vfe)", map_location="cuda")
points = batch_dict["points"]
inputs = points

with torch.no_grad():
    ptranshierarchy3d = model.backbone_3d
    # plain version, just one stage
    ptransblocks_list = ptranshierarchy3d.stage_0
    layer_norms_list = ptranshierarchy3d.residual_norm_stage_0

    pillar_features, voxel_coords = model.vfe(inputs)
    voxel_features = model.backbone_3d(pillar_features, voxel_coords)

    voxel_info = ptranshierarchy3d.input_layer(pillar_features, voxel_coords)
    set_voxel_inds_list = [[voxel_info[f'set_voxel_inds_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
    set_voxel_masks_list = [[voxel_info[f'set_voxel_mask_stage{s}_shift{i}'] for i in range(2)] for s in range(1)]
    pos_embed_list = [[[voxel_info[f'pos_embed_stage{s}_block{b}_shift{i}'] for i in range(2)] for b in range(4)] for s in range(1)]

    allptransblockstrt_inputs = (
        pillar_features,
        set_voxel_inds_list[0][0],
        set_voxel_inds_list[0][1],
        set_voxel_masks_list[0][0],
        set_voxel_masks_list[0][1],
        torch.stack([torch.stack(v, dim=0) for v in pos_embed_list[0]], dim=0),
    )

What is the input data file after vfe in the first line? How can I create or derive this file?

chenshi3 commented 1 year ago

You can run the test code and save the batch_dict in the first iteration.

dinvincible98 commented 1 year ago

Can you elaborate on this? I run the test.py and saved a batch dict. The input batch_dict[points] has 6 elements (batch_idx, x, y, z, intensity, elongation). I dig into the the DynamicPillarVFE template and I found the forward function takes a batch dict as input and extracts "points" again. If i input a batch dict directly it reports an error: ValueError: too many values to unpack (expected 2) Btw, I saved a batch dict like this: """ def eval_one_epoch(cfg, args, model, dataloader, epoch_id, logger, dist_test=False, result_dir=None): result_dir.mkdir(parents=True, exist_ok=True)

final_output_dir = result_dir / 'final_result' / 'data'
if args.save_to_file:
    final_output_dir.mkdir(parents=True, exist_ok=True)

metric = {
    'gt_num': 0,
}
for cur_thresh in cfg.MODEL.POST_PROCESSING.RECALL_THRESH_LIST:
    metric['recall_roi_%s' % str(cur_thresh)] = 0
    metric['recall_rcnn_%s' % str(cur_thresh)] = 0

dataset = dataloader.dataset
class_names = dataset.class_names
det_annos = []

if getattr(args, 'infer_time', False):
    start_iter = int(len(dataloader) * 0.1)
    infer_time_meter = common_utils.AverageMeter()

logger.info('*************** EPOCH %s EVALUATION *****************' % epoch_id)
if dist_test:
    num_gpus = torch.cuda.device_count()
    local_rank = cfg.LOCAL_RANK % num_gpus
    model = torch.nn.parallel.DistributedDataParallel(
            model,
            device_ids=[local_rank],
            broadcast_buffers=False
    )
model.eval()

if cfg.LOCAL_RANK == 0:
    progress_bar = tqdm.tqdm(total=len(dataloader), leave=True, desc='eval', dynamic_ncols=True)
start_time = time.time()
for i, batch_dict in enumerate(dataloader):
    load_data_to_gpu(batch_dict)

    if getattr(args, 'infer_time', False):
        start_time = time.time()

    with torch.no_grad():
        pred_dicts, ret_dict = model(batch_dict)

    disp_dict = {}

    if getattr(args, 'infer_time', False):
        inference_time = time.time() - start_time
        infer_time_meter.update(inference_time * 1000)
        # use ms to measure inference time
        disp_dict['infer_time'] = f'{infer_time_meter.val:.2f}({infer_time_meter.avg:.2f})'

    statistics_info(cfg, ret_dict, metric, disp_dict)
    annos = dataset.generate_prediction_dicts(
        batch_dict, pred_dicts, class_names,
        output_path=final_output_dir if args.save_to_file else None
    )
    det_annos += annos
    if cfg.LOCAL_RANK == 0:
        progress_bar.set_postfix(disp_dict)
        progress_bar.update()

    if i == 0:
        torch.save(batch_dict, "/home/mingqing.yuan/DSVT/vfe_model.pth")

"""
Did I save the wrong batch dict? Can you provide some guidance? Thanks

chenshi3 commented 1 year ago

You should save the batch_dict after function _load_data_togpu.

dinvincible98 commented 1 year ago

I saved the batch_dict after _load_data_togpu but I still got this error:

Traceback (most recent call last): File "torch_onnx.py", line 170, in pillar_features, voxel_coords = model.vfe(inputs) File "/home/mingqing.yuan/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/mingqing.yuan/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 186, in forward points = batch_dict['points'] # (batch_idx, x, y, z, i, e) IndexError: too many indices for tensor of dimension 2

chenshi3 commented 1 year ago

Sorry for that, I think I know the problem.

batch_dict = torch.load("input data file(after vfe)", map_location="cuda")
points = batch_dict["points"]
inputs = points

should be

batch_dict = torch.load("saved batch_dict", map_location="cuda")
inputs = batch_dict

zizhengu commented 1 year ago

ERROR When I try to using onnx model to create a session by: ort_session = ort.InferenceSession(onnx_path, providers = ['CUDAExecutionProvider']) I encountered the error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: I used the onnx model exported through your code, and someone else encountered this error too. Would you please take a look? Thanks!

my_env cuda_version = 11.4 onnxruntime-gpu_version = 1.14.1 opset_version = 11 python_version =3.7

chenshi3 commented 1 year ago

The deployment codes are unrefined, so there may be some misalignments. I suggest taking a look at the documentation for ONNX (https://pytorch.org/docs/stable/onnx.html?highlight=onnx#module-torch.onnx) to aid in this process. Unfortunately, my availability is limited due to other obligations, but we will release the refined deployment codes as soon as possible. However, if you require it urgently, we can provide a tensorrt-engine that can be executed directly by TensorRT.

在2023-06-08 @.***写道：

ERROR When I try to using onnx model to create a session by: ort_session = ort.InferenceSession(onnx_path, providers = ['CUDAExecutionProvider']) I encountered the error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: I used the onnx model exported through your code, and some else encountered this error too. Would you please take a look? Thanks!

my_env cuda_version = 11.4 onnxruntime-gpu_version = 1.14.1 opset_version = 11 python_version =3.7

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

chenshi3 commented 1 year ago

ERROR When I try to using onnx model to create a session by: ort_session = ort.InferenceSession(onnx_path, providers = ['CUDAExecutionProvider']) I encountered the error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: I used the onnx model exported through your code, and someone else encountered this error too. Would you please take a look? Thanks!

my_env cuda_version = 11.4 onnxruntime-gpu_version = 1.14.1 opset_version = 11 python_version =3.7

The deployment codes are unrefined, so there may be some misalignments. I suggest taking a look at the documentation for ONNX (https://pytorch.org/docs/stable/onnx.html?highlight=onnx#module-torch.onnx) to aid in this process. I am currently occupied with other tasks, but I will release the refined deployment codes as soon as possible. However, if you require it urgently, we can provide a tensorrt-engine that can be executed directly by TensorRT.

dinvincible98 commented 1 year ago

Sorry for that, I think I know the problem.

batch_dict = torch.load("input data file(after vfe)", map_location="cuda")
points = batch_dict["points"]
inputs = points

should be

batch_dict = torch.load("saved batch_dict", map_location="cuda")
inputs = batch_dict

After change inputs to batch_dict, I still have this error:

""" Traceback (most recent call last): File "torch_onnx.py", line 170, in pillar_features, voxel_coords = model.vfe(batch_dict) ValueError: too many values to unpack (expected 2) """

dinvincible98 commented 1 year ago

Hi,

I've successfully converted a engine file. Do you have any sample inference script to run the model?

chenshi3 commented 1 year ago

You need to load the trt-engine to substitute origin model and wrap the input before feeding it into the engine..

chenshi3 commented 1 year ago

Hi,

I've successfully converted a engine file. Do you have any sample inference script to run the model?

We provide examples of load trt_engine when inference.

Haiyang-W / DSVT

TensorRT deployment question #28