NVIDIA-AI-IOT / trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
MIT License
972 stars 291 forks source link

rtr infrence is no result. #11

Open lxy5513 opened 4 years ago

lxy5513 commented 4 years ago

Hello,jaybdub.
Thanks for you code.

I try to use trt model in 1080ti GPU and test the estimation result, but after counts, objects, peaks = parse_objects(cmap, paf) , The counts is [0] .

Same time, I test in original model resnet18_baseline_att_224x224_A_epoch_249.pthyou provided, the result is not bad. So would you mind give some advice about this

lxy5513 commented 4 years ago

By The way, every time I run trt model over, It occurs Segmentation fault

lxy5513 commented 4 years ago

this is my trt model code

import json
import trt_pose.coco
import ipdb; pdb=ipdb.set_trace
import trt_pose.models
import torch
import cv2
import torch2trt
import torchvision.transforms as transforms
import PIL.Image
from tqdm import tqdm
from trt_pose.draw_objects import DrawObjects
from trt_pose.parse_objects import ParseObjects
device = torch.device('cuda')

with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)
topology = trt_pose.coco.coco_category_to_topology(human_pose)
parse_objects = ParseObjects(topology)
draw_objects = DrawObjects(topology)

def load_model():
    OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'
    from torch2trt import TRTModule
    model_trt = TRTModule()
    model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))
    #  return model_trt.eval()
    return model_trt

def test_inference():
    WIDTH = 224
    HEIGHT = 224
    data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
    import time

    t0 = time.time()
    torch.cuda.current_stream().synchronize()
    for i in range(50):
        y = model(data)
    torch.cuda.current_stream().synchronize()
    t1 = time.time()
    print("infrence speed is {:0.2f}".format(50.0 / (t1 - t0)) )

def preprocess(image):
    mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()
    std = torch.Tensor([0.229, 0.224, 0.225]).cuda()
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = PIL.Image.fromarray(image)
    image = transforms.functional.to_tensor(image).to(device)
    image.sub_(mean[:, None, None]).div_(std[:, None, None])
    return image[None, ...]

def img_resize(image, max_length=640):
    H, W = image.shape[:2]
    if max(W, H) > max_length: #shrink
        interpolation = cv2.INTER_AREA
    else:
        interpolation = cv2.INTER_LINEAR

    if W>H:
        W_resize = max_length
        H_resize = int(H * max_length / W)
    else:
        H_resize = max_length
        W_resize = int(W * max_length / H)
    image = cv2.resize(image, (W_resize, H_resize), interpolation=interpolation)
    return image, W_resize, H_resize

def img_demo(path):
    torch.cuda.current_stream().synchronize()
    model = load_model()
    image = cv2.imread(path)
    data = preprocess(image)
    cmap, paf = model(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    torch.cuda.current_stream().synchronize()
    counts, objects, peaks = parse_objects(cmap, paf) # cmap_threshold=0.15, link_threshold=0.15)
    draw_objects(image, counts, objects, peaks)
    cv2.imshow("torch pose estimation", image)
    cv2.waitKey(100)

def video_infrence(video_name):
    cap = cv2.VideoCapture(video_name) 
    video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    model = load_model()
    for i in tqdm(range(video_length)):
        _, image = cap.read()
        image = img_resize(image, 640)[0]
        data = preprocess(image)
        cmap, paf = model(data)
        cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
        counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
        draw_objects(image, counts, objects, peaks)
        cv2.imshow("torch pose estimation", image)
        cv2.waitKey(1)

if __name__ == "__main__":
    path = "/home/xyliu/cvToolBox/data/test.png"
    img_demo(path)
    #  video_infrence("/home/xyliu/cvToolBox/data/test.mp4")
    video_infrence("/home/xyliu/cvToolBox/data/football.mp4")
lxy5513 commented 4 years ago
# origin model
def load_model():
    num_parts = len(human_pose['keypoints'])
    num_links = len(human_pose['skeleton'])
    model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
    MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth'
    model.load_state_dict(torch.load(MODEL_WEIGHTS))
    return model.eval()
jaybdub commented 4 years ago

Hi lxy5513,

Thanks for reaching out!

Do you know where the segfault occurs?

Also, do you mind sharing the code that you used to generate the OPTIMIZED_MODEL?

Best, John

lxy5513 commented 4 years ago

Thanks for response, the code I generate the OPTIMIZED_MODEL is you supported.

import json
import trt_pose.coco
with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)
topology = trt_pose.coco.coco_category_to_topology(human_pose)
import trt_pose.models

num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])

model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
import torch
MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth'
model.load_state_dict(torch.load(MODEL_WEIGHTS))

WIDTH = 224
HEIGHT = 224

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
import torch2trt
model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)

OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'
torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)

Do you know where the segfault occurs?

When I run over the program/script, The segfault occurs.

lxy5513 commented 4 years ago

after image = cv2.resize(image, (224, 224), interpolation=interpolation) the trtModel is normal work.

so after convert the torch model to TensorRT, Did we must confirm the inferencc data shape is same as below?

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
anuar12 commented 4 years ago

FYI, when I tried torch2trt() on my 2080 Ti it didn't give any keypoint output. However on Jetson it worked.

jaybdub commented 4 years ago

@anuar12 Hmm... Do you mind sharing

Best, John

anuar12 commented 4 years ago

Pytorch 1.3.0, TensorRT 6.0.1.5, Torchvision 0.4.1, CUDA 10.0, 2080 Ti. I tried it quickly, I can try it again on Monday. I had different versions on Xavier (Pytorch 1.1, TensorRT 5.1.6, Torchvision 0.4.1)

jis-mon commented 4 years ago

@lxy5513 resizing the image with PIL might fix the issue of showing zero detection image=image.resize((WIDTH,HEIGHT),resample=PIL.Image.BILINEAR)

undefined-references commented 4 years ago

I had segmentation fault issue and moved import cv2 to the first line and it fixed! @lxy5513