ZikangZhou / QCNet

[CVPR 2023] Query-Centric Trajectory Prediction
https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.pdf
Apache License 2.0
481 stars 76 forks source link

Inference time (ms) and GFLOPs of QCNet and QCNeXT #9

Closed shahaamirbader closed 1 year ago

shahaamirbader commented 1 year ago

Hi, can you share the Inference time (ms) and GFLOPs of QCNet and QCNeXT projects. Something very similar to this paper that reported these metrics for other frameworks in the paper titled :

ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

image

ZikangZhou commented 1 year ago

It is impossible to compare these numbers rigorously, as the inference latency/GFLOPs depends on the complexity of the specific traffic scene. But you can test the latency yourself using a specific scenario.

shahaamirbader commented 1 year ago

I agree, this can be very subjective. I would love to test myself but have no idea how to do so and where to inject the code. Nonetheless, I think the PropNet the following standards for testing these latencies. Just sharing for info.

image

ZikangZhou commented 1 year ago

You can try something like this:

from typing import Optional, Union

import numpy as np
import torch
import torch.nn as nn
from torch_geometric.data import Dataset
from torch_geometric.loader import DataLoader
from tqdm import tqdm

def inference_benchmark(
        model: nn.Module,
        dataset: Dataset,
        device: Optional[Union[torch.device, str]],
        warmup_steps: int = 10,
        batch_size: int = 1,
        shuffle: bool = False,
        num_workers: int = 0,
        pin_memory: bool = True,
        persistent_workers: bool = True) -> None:
    model.to(device)
    model.eval()
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers,
                            pin_memory=pin_memory, persistent_workers=persistent_workers and num_workers > 0)
    start, end = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)

    times = []
    with torch.no_grad():
        for i, batch in enumerate(tqdm(dataloader)):
            batch = batch.to(device)
            if i == 0:
                for _ in range(warmup_steps):
                    model(batch)
            start.record()
            model(batch)
            end.record()
            torch.cuda.synchronize()
            times.append(start.elapsed_time(end))
    mean_time = np.mean(times)
    std_time = np.std(times)
    print('Average inference time (ms): {:.3f} +- {:.3f}'.format(mean_time, std_time))