SJTU-LuHe / TransVOD

The repository is the code for the paper "End-to-End Video Object Detection with Spatial-TemporalTransformers"
Apache License 2.0
203 stars 28 forks source link

Running videos for viewable object detection #17

Open zoelevin opened 1 year ago

zoelevin commented 1 year ago

I was wondering how we can run our own videos for object detection so that we can get video output with the bounding boxes and labels like shown in figure 9 of the connected paper? I saw something about how mmtracking has demo scripts, but I couldn't figure out how to use TransVOD similarly to get the results I need. Thank you.

itbergl commented 1 year ago

Maybe not be a perfect solution, but you can run the evaluation script and get the results in the variables target and results.

At the bottom of evaluate() in you can append the res variable to a list and return it.

def evaluate(model, criterion, postprocessors, data_loader, base_ds, device, output_dir):
   extracted_results = list()
   for samples, targets in metric_logger.log_every(data_loader, 10, header):
      res = {target['image_id'].item(): output for target, output in zip(targets, results)}

return stats, coco_evaluator, extracted_results

Then you would unpack an extra tuple item in and you could use pickle to save the results.

To actually view the video you could use open-cv to make a video from the images, utilizing the rectangle function to draw the bounding boxes.

adilsonmedronha commented 1 year ago

thanks @itbergl. I did as you suggested (eg printing the bounding boxes I got):

{654: {'scores': tensor( [0.2183, 0.1489, 0.1324, 0.1290, 0.0712, 0.0674, 0.0617, 0.0446, 0.0436, 0.0353, 0.0341, 0.0338, 0.0338, 0.0326, 0.0317, 0.0314], device='cuda:0'), 'labels': tensor([ 7, 17, 23, 19, 27, 27, 17, 7, 17, 7, 19, 23, 5, 2, 22, 17], device='cuda:0'), 'boxes': tensor([[ 1.7216e+00, 4.1063e+01, 2.6564e+01, 1.0398e+02], [ 4.5522e+02, 1.1981e+02, 5.5804e+02, 1.5068e+02], [ 4.5522e+02, 1.1981e+02, 5.5804e+02, 1.5068e+02], [ 4.3279e+01, 1.1582e+02, 8.4566e+01, 1.7838e+02]]], device='cuda:0')}}

However, the information about the frame associated with image id 654 is not included in that dictionary. How can I know which bb corresponds to each image_id 654 frames?

itbergl commented 1 year ago

@adilsonmedronha maybe try accessing the .json file. You can do something like this to get the frame information:

with open('*.json', 'rb') as f:
   data = json.load(f)

image_id_to_filename = {img["id"]: img["name"] for img in data["images"]}

That should give you a dictionary of image IDs to filenames.