agentmorris / MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
MIT License
117 stars 26 forks source link

Option to generate video JSON with all frames instead of only the best of that category #149

Open PetervanLunteren opened 1 week ago

PetervanLunteren commented 1 week ago

For the EcoAssist and Timelapse integration of video processing results, we need an option to get a video JSON file with detection results for all frames processed.

Now I do it inside EcoAssist like so:

    # Load the video recognition JSON file
    if video_json:
        with open(video_json, 'r') as video_file:
            video_data = json.load(video_file)

        # group the frame-level detections into a their parent video
        if re.search(r"frame\d{6}\.jpg", video_data['images'][0]['file']):
            aggregated_detections_per_video = defaultdict(list)
            for frame in video_data['images']:
                video = os.path.join(*Path(frame['file']).parts[:-1])
                frame_number = int(Path(frame['file']).stem[5:])
                detections = frame['detections']
                detections = detections[0]
                detections['frame_number'] = frame_number
                aggregated_detections_per_video[video].append(detections)

            new_video_data = []
            for video, detections in aggregated_detections_per_video.items():
                new_video_data.append({"file": video, "detections": detections})

            video_data['images'] = new_video_data

That would result in something like this:

  "file": "vid1.mp4",
   "detections": [
    {
     "category": "1",
     "conf": 0.944,
     "bbox": [
      0.5789,
      0.5,
      0.3414,
      0.3249
     ],
     "classifications": [
      [
       "28",
       0.98437
      ]
     ],
     "frame_number": 0
    },
    {
     "category": "1",
     "conf": 0.94,
     "bbox": [
      0.6335,
      0.5152,
      0.2757,
      0.3111
     ],
     "classifications": [
      [
       "28",
       0.98423
      ]
     ],
     "frame_number": 30
    },
    ...

But it feels quite cumbersome to convert this inside EcoAssist (and error prone), if MegaDetector had removed them earlier. What are you're thoughts about this?

agentmorris commented 3 days ago

@PetervanLunteren, does the --include_all_processed_frames option to process_video.py already do what you're looking for?

If you already have a .json file that came from somewhere else, and you want to use the same mechanics, you can use the frame_results_to_video_results function, which is what process_video uses to convert frame-level results to video-level results. It takes an options object that has an include_all_processed_frames option.

Let me know if that already does what you're looking for, or if not, how the behavior you're looking for is different?

Either way, I think we're between 90% and 100% of the way there.