Closed ghost closed 2 years ago
Hi, we only support an inference batch of 1 by now.
If you want to do the batch inference, you need to pad the input videos to the same length for collact, where you should modify this function.
I recommend you use the current API, but if you still would like to support it, I can help if you have more specific questions.
@OceanPang I do not understand what you mean by padding the input videos. I actually have a set of images extracted from the video. I am using the data loader like this:
data_loader = build_dataloader(
dataset,
samples_per_gpu=batch_size,
workers_per_gpu=cfg.data.workers_per_gpu,
dist=False,
shuffle=False)
But the result that I get doesn't help me identify tracks and detections for every image in a batch. Maybe if you can help me understand what I should modify in your code to get the expected result, that would be great. Thanks
It's really cumbersome. For example, you have 2 videos with lengths of 100 and 150 frames respectively. When doing batch inference, each GPU should always have a consistent number of videos. So you need to pad the 100 frames video to 150 frames.
Hi. I managed to make your code work for inference with 1 image per GPU (so, batch-size of 1) with our custom dataset. Here is the code I am using:
I tried to make it work with a batch size of 2 but I noticed there is no way to know which detection/ track inside the result value belongs to which image in the batch of images passed to the model as input.
Here is my understanding of what
result = model(return_loss=False, rescale=True, **data)
represents. Result is a dictionary that contains two keys:bbox_results
andtrack_results
. For both keys, the value is a list whose size is equal to 482 (the exact number of classes in the TAO dataset). So, as seen above in the code, I get this list and I iterate through each member of the list. Each member, when the key isbbox_result
, is a numpy array with shapenum_of_detections x 5
where num_of_detections is for the whole batch (?) and 5 because each detection is represented by the bbox coordinates and a confidence value. Fortrack_results
, this isnum_of_tracks x 6
because you have that one extra value for the track id.If my above understanding of the structure of results is correct, then it seems to me there is no way to assign a track or detection to a specific image in a batch. Is there a way to do that in the code sample I posted above? Thanks.