Closed linssswww closed 5 years ago
You can potentially use PyTorch for your data management entirely. This is an example of using PyTorch Tensors for both the input and output buffers of the engine (as opposed to pycuda). https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/torch2trt.py#L206
@narendasan thanks for your help, my problem has been solved, but there is a new problem which I speed up the feature map, it just speed up 10%. it is normal ?
You might want to try using a reduced operating precision (FP16 or INT8) to further improve performance
hi @narendasan and @linssswww u can see this jetbot/tensorrt_model.py. it is more simple to do it. hope it can help u.
hi @linssswww why do i convert fpn in pytorch to tensorrt's , but tensorrt one is slower than pytorch ? the issue in link : https://github.com/NVIDIA/TensorRT/issues/458
@zimenglan-sysu-512 @linssswww i tried implementing inference with pytorch tensors as bindings however i’m running into an issue: https://github.com/NVIDIA/TensorRT/issues/303#issuecomment-652187126
Basically getting ‘../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)’ on a simple test case
Any ideas why this might be happening? The error only happens when I use pytorch tensor bindings (if i cuda.mem_alloc then this issue doesn’t happen). I’m trying to get gpu torch tensors as output from my tensorRT engine.
Thanks!
Hi @prathik-naidu ,
i tried implementing inference with pytorch tensors as bindings however i’m running into an issue: #303 (comment)
Basically getting ‘../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)’ on a simple test case
I believe this is a PyTorch issue: https://github.com/pytorch/pytorch/issues/32983
Thanks @rmccorm4, didn't see this issue before. Was there any solution to this? Not sure if differences in cuDNN version is an explanation (although for me this doesn't seem to be an issue since I'm on 7.6 looking at my /usr/include/cudnn.h
and pytorch + tensorRT both match that).
@rmccorm4 I was continuing to investigate this issue and figured out how to get this working (the CUDDNN_STATUS_MAPPING_ERROR is resolved):
self.engine = self._load_engine()
self.context = self.engine.create_execution_context()
inputs = [torch.ones((1, 3, 256, 416), device="cuda:0")] # move this line BEFORE pycuda.autoinit
import pycuda.autoinit
outputs = [torch.zeros((1, 3, 8, 13), device="cuda:0"), torch.zeros((1, 3, 16, 26), device="cuda:0"),
torch.zeros((1, 3, 32, 52), device="cuda:0"), torch.zeros((1, 6552, 6), device="cuda:0")]
bindings = [_input.data_ptr() for _input in inputs] + [_output.data_ptr() for _output in outputs]
self.context.execute_v2(bindings)
The key here, as shown in the code above, is to first allocate the inputs tensor on the gpu BEFORE setting up a new cuda context (import pycuda.autoinit
) and allocating the output tensors. This seems to work as desired and the original error isn't shown anymore, however I'm trying to understand what's going on here behind the scenes. Seems like torch has its own context that becomes inconsistent between inputs and outputs in the original code? Not too sure about this though.
You can potentially use PyTorch for your data management entirely. This is an example of using PyTorch Tensors for both the input and output buffers of the engine (as opposed to pycuda). https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/torch2trt.py#L206
I have a same issue, could you please describe it in detail for me? I can't find the answer in the link.Thanks!
@narendasan thanks for your help, my problem has been solved, but there is a new problem which I speed up the feature map, it just speed up 10%. it is normal ?
Could you please provide your solution? I meet the same question.
I want to speed up the part of faster-rcnn-fpn, which is extractor of feature map. the feature map size is large. and I get the output of tensorrt which is mem_alloc object, but I need pytorch tensor object. I try to convert mem_alloc object to pytorch tensor, but it spend too much time in memcpy from gpu to cpu. how to convert data type from cuda.mem_alloc object to pytorch tensor object without copying?
my code: