google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.72k stars 5.18k forks source link

When I use Autoflip, in scene_cropping_calculator, how to get metadata by editing pbtxt? #2394

Closed nelsontseng0704 closed 3 years ago

nelsontseng0704 commented 3 years ago

In scene_cropping_calculator note, how to get metadata e.g: cropping images (x,y,w,h) coordinates and timestamp before cropping images by editing pbtxt?

And also how do I find all options in scene_cropping_calculator? it helps me understand calculators better.

Many thanks,

System information

sgowroji commented 3 years ago

Hi @nelsontseng0704, Could you please elaborate your use case from the above query. Thanks!

nelsontseng0704 commented 3 years ago

Hi GowrojiSunil,

Thanks for your fast reply and suggestion. Here are more details of my use case. By the way, how do I find all options in scene_cropping_calculator? It helps me understand those calculators better.

System information (Please provide as much relevant information as possible)

OS Platform and Distribution :iOS 11.3.1 MediaPipe version: 0.8.6 Bazel version: bazel 3.7.2 Solution: Autoflip Programming Language: C++

Describe the expected behavior:

In SceneCroppingCalculator, the graph will output save a CSV or txt file including cropping images (x,y,w,h) coordinates and timestamp.

Standalone code you may have used to try to get what you need :

max_queue_size: -1

node {
  calculator: "OpenCvVideoDecoderCalculator"
  input_side_packet: "INPUT_FILE_PATH:input_video_path"
  output_stream: "VIDEO:video_raw"
  output_stream: "VIDEO_PRESTREAM:video_header"
  output_side_packet: "SAVED_AUDIO_PATH:audio_path"
}

node {
  calculator: "ScaleImageCalculator"
  input_stream: "FRAMES:video_raw"
  input_stream: "VIDEO_HEADER:video_header"
  output_stream: "FRAMES:video_frames_scaled"
  options: {
    [mediapipe.ScaleImageCalculatorOptions.ext]: {
      preserve_aspect_ratio: true
      output_format: SRGB
      target_width: 480
      algorithm: DEFAULT_WITHOUT_UPSCALE
    }
  }
}

node {
  calculator: "PacketThinnerCalculator"
  input_stream: "video_frames_scaled"
  output_stream: "video_frames_scaled_downsampled"
  options: {
    [mediapipe.PacketThinnerCalculatorOptions.ext]: {
      thinner_type: ASYNC
      period: 200000
    }
  }
}

node {
  calculator: "BorderDetectionCalculator"
  input_stream: "VIDEO:video_raw"
  output_stream: "DETECTED_BORDERS:borders"
}

node {
  calculator: "ShotBoundaryCalculator"
  input_stream: "VIDEO:video_frames_scaled"
  output_stream: "IS_SHOT_CHANGE:shot_change"
  options {
    [mediapipe.autoflip.ShotBoundaryCalculatorOptions.ext] {
      min_shot_span: 0.2
      min_motion: 0.3
      window_size: 15
      min_shot_measure: 10
      min_motion_with_shot_measure: 0.05
    }
  }
}

node {
  calculator: "AutoFlipFaceDetectionSubgraph"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  output_stream: "DETECTIONS:face_detections"
}
node {
  calculator: "FaceToRegionCalculator"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  input_stream: "FACES:face_detections"
  output_stream: "REGIONS:face_regions"
}

node {
  calculator: "AutoFlipObjectDetectionSubgraph"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  output_stream: "DETECTIONS:object_detections"
}
node {
  calculator: "LocalizationToRegionCalculator"
  input_stream: "DETECTIONS:object_detections"
  output_stream: "REGIONS:object_regions"
  options {
    [mediapipe.autoflip.LocalizationToRegionCalculatorOptions.ext] {
      output_all_signals: true
    }
  }
}

node {
  calculator: "SignalFusingCalculator"
  input_stream: "shot_change"
  input_stream: "face_regions"
  input_stream: "object_regions"
  output_stream: "salient_regions"
  options {
    [mediapipe.autoflip.SignalFusingCalculatorOptions.ext] {
      signal_settings {
        type { standard: FACE_CORE_LANDMARKS }
        min_score: 0.85
        max_score: 0.9
        is_required: false
      }
      signal_settings {
        type { standard: FACE_ALL_LANDMARKS }
        min_score: 0.8
        max_score: 0.85
        is_required: false
      }
      signal_settings {
        type { standard: FACE_FULL }
        min_score: 0.8
        max_score: 0.85
        is_required: false
      }
      signal_settings {
        type: { standard: HUMAN }
        min_score: 0.75
        max_score: 0.8
        is_required: false
      }
      signal_settings {
        type: { standard: PET }
        min_score: 0.7
        max_score: 0.75
        is_required: false
      }
      signal_settings {
        type: { standard: CAR }
        min_score: 0.7
        max_score: 0.75
        is_required: false
      }
      signal_settings {
        type: { standard: OBJECT }
        min_score: 0.1
        max_score: 0.2
        is_required: false
      }
    }
  }
}

node {
  calculator: "SceneCroppingCalculator"
  input_side_packet: "EXTERNAL_ASPECT_RATIO:aspect_ratio"
  input_stream: "VIDEO_FRAMES:video_raw"
  input_stream: "KEY_FRAMES:video_frames_scaled_downsampled"
  input_stream: "DETECTION_FEATURES:salient_regions"
  input_stream: "STATIC_FEATURES:borders"
  input_stream: "SHOT_BOUNDARIES:shot_change"
## I don't know what options I can choose in SceneCroppingCalculator
  output_stream: "CROPPED_FRAMES:cropped_frames" 
  options: {
    [mediapipe.autoflip.SceneCroppingCalculatorOptions.ext]: {
      max_scene_size: 600
      key_frame_crop_options: {
        score_aggregation_type: CONSTANT
      }
      scene_camera_motion_analyzer_options: {
        motion_stabilization_threshold_percent: 0.01
        salient_point_bound: 0.499
      }
      padding_parameters: {
        blur_cv_size: 200
        overlay_opacity: 0.6
      }
      target_size_type: MAXIMIZE_TARGET_DIMENSION
    }
  }
}
nathanfrey-ovx commented 3 years ago

To render externally you can use the message here: https://github.com/google/mediapipe/blob/b899d17f185f6bcbf3f5947d3e134f8ce1e69407/mediapipe/examples/desktop/autoflip/calculators/scene_cropping_calculator.cc#L120 and that external render frame message is described here: https://github.com/google/mediapipe/blob/ecb5b5f44ab23ea620ef97a479407c699e424aa7/mediapipe/examples/desktop/autoflip/autoflip_messages.proto#L168 If you don't send the video packet into the scene cropping calculator it will save computing costs (since you are rendering externally).

The settings for the scenecroppingcalculator are described in the proto message for the config. However, most default settings should work for most applications. To control what the camera is focusing on consider making adjustments to the signal fusing calculator config.

nelsontseng0704 commented 3 years ago

Hi Nathan,

Appreciate your reply. I have a follow-up question. How do I print out metadata and save the metadata as a CSV file in Scene_cropping_calculator?

# Autoflip graph that renders the final cropped video and debugging videos.
# For use by developers who may be adding signals and adjusting weights.
max_queue_size: -1
#max_queue_size: 16

# VIDEO_PREP: Decodes an input video file into images and a video header.
node {
  calculator: "OpenCvVideoDecoderCalculator"
  input_side_packet: "INPUT_FILE_PATH:input_video_path"
  output_stream: "VIDEO:video_raw"
  output_stream: "VIDEO_PRESTREAM:video_header"
  output_side_packet: "SAVED_AUDIO_PATH:audio_path"
}

# VIDEO_PREP: Scale the input video before feature extraction.
node {
  calculator: "ScaleImageCalculator"
  input_stream: "FRAMES:video_raw"
  input_stream: "VIDEO_HEADER:video_header"
  output_stream: "FRAMES:video_frames_scaled"
  options: {
    [mediapipe.ScaleImageCalculatorOptions.ext]: {
      preserve_aspect_ratio: true
      output_format: SRGB
      target_width: 640
      algorithm: DEFAULT_WITHOUT_UPSCALE
    }
  }
}

# VIDEO_PREP: Create a low frame rate stream for feature extraction.
node {
  calculator: "PacketThinnerCalculator"
  input_stream: "video_frames_scaled"
  output_stream: "video_frames_scaled_downsampled"
  options: {
    [mediapipe.PacketThinnerCalculatorOptions.ext]: {
      thinner_type: ASYNC
      period: 200000
    }
  }
}

# DETECTION: find borders around the video and major background color.
node {
  calculator: "BorderDetectionCalculator"
  input_stream: "VIDEO:video_raw"
  output_stream: "DETECTED_BORDERS:borders"
}

# DETECTION: find shot/scene boundaries on the full frame rate stream.
node {
  calculator: "ShotBoundaryCalculator"
  input_stream: "VIDEO:video_frames_scaled"
  output_stream: "IS_SHOT_CHANGE:shot_change"
  options {
    [mediapipe.autoflip.ShotBoundaryCalculatorOptions.ext] {
      min_shot_span: 0.2
      min_motion: 0.3
      window_size: 15
      min_shot_measure: 10
      min_motion_with_shot_measure: 0.05
    }
  }
}

# DETECTION: find faces on the down sampled stream
node {
  calculator: "AutoFlipFaceDetectionSubgraph"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  output_stream: "DETECTIONS:face_detections"
}
node {
  calculator: "FaceToRegionCalculator"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  input_stream: "FACES:face_detections"
  output_stream: "REGIONS:face_regions"
}

# DETECTION: find objects on the down sampled stream
node {
  calculator: "AutoFlipObjectDetectionSubgraph"
  input_stream: "VIDEO:video_frames_scaled_downsampled"
  output_stream: "DETECTIONS:object_detections"
}
node {
  calculator: "LocalizationToRegionCalculator"
  input_stream: "DETECTIONS:object_detections"
  output_stream: "REGIONS:object_regions"
  options {
    [mediapipe.autoflip.LocalizationToRegionCalculatorOptions.ext] {
      output_all_signals: true
    }
  }
}

# SIGNAL FUSION: Combine detections (with weights) on each frame
node {
  calculator: "SignalFusingCalculator"
  input_stream: "shot_change"
  input_stream: "face_regions"
  input_stream: "object_regions"
  output_stream: "salient_regions"
  options {
    [mediapipe.autoflip.SignalFusingCalculatorOptions.ext] {
      signal_settings {
        type { standard: FACE_CORE_LANDMARKS }
        min_score: 0.85
        max_score: 0.9
        is_required: false
      }
      signal_settings {
        type { standard: FACE_ALL_LANDMARKS }
        min_score: 0.8
        max_score: 0.85
        is_required: false
      }
      signal_settings {
        type { standard: FACE_FULL }
        min_score: 0.8
        max_score: 0.85
        is_required: true
      }
      signal_settings {
        type: { standard: HUMAN }
        min_score: 0.7
        max_score: 0.8
        is_required: false
      }
      signal_settings {
        type: { standard: PET }
        min_score: 0.7
        max_score: 0.75
        is_required: false
      }
      signal_settings {
        type: { standard: CAR }
        min_score: 0.7
        max_score: 0.75
        is_required: false
      }
      signal_settings {
        type: { standard: OBJECT }
        min_score: 0.1
        max_score: 0.2
        is_required: false
      }
    }
  }
}

# CROPPING: make decisions about how to crop each frame.
node {
  calculator: "SceneCroppingCalculator"
  input_side_packet: "EXTERNAL_ASPECT_RATIO:aspect_ratio"
  input_stream: "VIDEO_FRAMES:video_raw"
  input_stream: "KEY_FRAMES:video_frames_scaled_downsampled"
  input_stream: "DETECTION_FEATURES:salient_regions"
  input_stream: "STATIC_FEATURES:borders"
  input_stream: "SHOT_BOUNDARIES:shot_change"
  output_stream: "EXTERNAL_RENDERING_PER_FRAME:external_rendering_per_frame"
  options: {
    [mediapipe.autoflip.SceneCroppingCalculatorOptions.ext]: {
      max_scene_size: 600
      key_frame_crop_options: {
        score_aggregation_type: CONSTANT
      }
      scene_camera_motion_analyzer_options: {
        motion_stabilization_threshold_percent: 0.5
        salient_point_bound: 0.499
      }
      padding_parameters: {
        blur_cv_size: 200
        overlay_opacity: 0.6
      }
      target_size_type: MAXIMIZE_TARGET_DIMENSION
    }
  }
}
nathanfrey-ovx commented 3 years ago

The render message is populated here: https://github.com/google/mediapipe/blob/b899d17f185f6bcbf3f5947d3e134f8ce1e69407/mediapipe/examples/desktop/autoflip/calculators/scene_cropping_calculator.cc#L644

After this call you can turn the proto into a string and save to a file. std::string render_frame_string = render_frame.DebugString(); opened_ofstream_file << render_frame_string;

You will also need to make sure you send that as an output packet in the config (even though you are not using it, but to make sure the above code path is enabled): https://github.com/google/mediapipe/blob/b899d17f185f6bcbf3f5947d3e134f8ce1e69407/mediapipe/examples/desktop/autoflip/autoflip_graph.pbtxt#L159 output_stream: "EXTERNAL_RENDERING_PER_FRAME:not_used"

google-ml-butler[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] commented 3 years ago

Closing as stale. Please reopen if you'd like to work on this further.