Closed thesofakillers closed 3 years ago
Hi @thesofakillers, you are correct in thinking that autoflip can be adapted for real-time usage. There shouldn't be any big blockers to making this work. You should be able to look at examples that run directly from the camera input and modify the autoflip graph to use those calculators at the input and output instead of reading from and writing to a file.
Autoflip does introduce some latency (delay) because of scene buffering. The parameter that controls this is called _max_scenesize and reducing this number will reduce the latency. However, for inputs that are one long scene (like a webcam) users may notice a jump in camera position when each buffer is processed and flushed. For better real-time quality we should make changes to maintain the crop position across buffering. I will discuss internally making this update.
If you create an autoflip graph to work on a real-time stream please share and I can help tune it for your application.
We are closing this issue for now due to lack of activity.
Hi @ndimension, @jiuqiant. Thank you for replying earlier, I've only recently gotten back to this.
I've been trying to get this to work by modifying AutoFlip's graph as suggested by @ndimension above. I'm basing my modifications on the object_detection_desktop_live graph, which works fine on my setup. The goal is to test it on my webcam and then move on to an ip video stream later.
However, when I try running my modified graph with
GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/autoflip/run_autoflip \
--calculator_graph_config_file=mediapipe/examples/desktop/autoflip/autoflip_graph.pbtxt \
--input_side_packets=aspect_ratio=1:2
The graph prints out as expected, but then mediapipe seems to hang on the following:
I20200525 19:10:24.033686 525520320 demo_run_graph_main.cc:69] Initialize the calculator graph.
I20200525 19:10:24.037720 525520320 demo_run_graph_main.cc:73] Initialize the camera or load the video.
I20200525 19:10:27.993374 525520320 demo_run_graph_main.cc:94] Start running the calculator graph.
I20200525 19:10:27.993629 525520320 demo_run_graph_main.cc:99] Start grabbing and processing frames.
The changes I made to get to this point can be found at my fork (on the live
branch). In particular I:
input_stream
and output_stream
max_scene_size
is set to 60
. Keeping the default of 600
does not seem to resolve the issue.demo_run_graph_main.cc
so that I can input and output video
streamsI've tried figuring out why it appears to be hanging. It seems like the hanging begins here, which seems to suggest that AutoFlip is not sending any result packets. I'm not really sure why it is doing that. Perhaps I'm missing a calculator from the end of my graph.
I would greatly appreciate any sort of suggestions on how to unblock this. Thank you very much for your work and help thus far!
Could we reopen this issue?
I'm also experiencing this problem. I'm basically doing the same as @thesofakillers.
Here's the graph:
max_queue_size: -1
# Input image ImageFrame)
input_stream: "input_video"
# Output image with rendered results (ImageFrame)
output_stream: "output_video"
# VIDEO_PREP: Scale the input video before feature extraction.
node {
calculator: "ScaleImageCalculator"
input_stream: "FRAMES:input_video"
#input_stream: "VIDEO_HEADER:video_header"
output_stream: "FRAMES:video_frames_scaled"
options: {
[mediapipe.ScaleImageCalculatorOptions.ext]: {
preserve_aspect_ratio: true
output_format: SRGB
target_width: 480
algorithm: DEFAULT_WITHOUT_UPSCALE
}
}
}
# VIDEO_PREP: Create a low frame rate stream for feature extraction.
node {
calculator: "PacketThinnerCalculator"
input_stream: "video_frames_scaled"
output_stream: "video_frames_scaled_downsampled"
options: {
[mediapipe.PacketThinnerCalculatorOptions.ext]: {
thinner_type: ASYNC
period: 200000
}
}
}
# DETECTION: find borders around the video and major background color.
node {
calculator: "BorderDetectionCalculator"
input_stream: "VIDEO:input_video"
output_stream: "DETECTED_BORDERS:borders"
}
# DETECTION: find shot/scene boundaries on the full frame rate stream.
node {
calculator: "ShotBoundaryCalculator"
input_stream: "VIDEO:video_frames_scaled"
output_stream: "IS_SHOT_CHANGE:shot_change"
options {
[mediapipe.autoflip.ShotBoundaryCalculatorOptions.ext] {
min_shot_span: 0.2
min_motion: 0.3
window_size: 15
min_shot_measure: 10
min_motion_with_shot_measure: 0.05
}
}
}
# DETECTION: find faces on the down sampled stream
node {
calculator: "AutoFlipFaceDetectionSubgraph"
input_stream: "VIDEO:video_frames_scaled_downsampled"
output_stream: "DETECTIONS:face_detections"
}
node {
calculator: "FaceToRegionCalculator"
input_stream: "VIDEO:video_frames_scaled_downsampled"
input_stream: "FACES:face_detections"
output_stream: "REGIONS:face_regions"
}
# DETECTION: find objects on the down sampled stream
node {
calculator: "AutoFlipObjectDetectionSubgraph"
input_stream: "VIDEO:video_frames_scaled_downsampled"
output_stream: "DETECTIONS:object_detections"
}
node {
calculator: "LocalizationToRegionCalculator"
input_stream: "DETECTIONS:object_detections"
output_stream: "REGIONS:object_regions"
options {
[mediapipe.autoflip.LocalizationToRegionCalculatorOptions.ext] {
output_all_signals: true
}
}
}
# SIGNAL FUSION: Combine detections (with weights) on each frame
node {
calculator: "SignalFusingCalculator"
input_stream: "shot_change"
input_stream: "face_regions"
input_stream: "object_regions"
output_stream: "salient_regions"
options {
[mediapipe.autoflip.SignalFusingCalculatorOptions.ext] {
signal_settings {
type { standard: FACE_CORE_LANDMARKS }
min_score: 0.85
max_score: 0.9
is_required: false
}
signal_settings {
type { standard: FACE_ALL_LANDMARKS }
min_score: 0.8
max_score: 0.85
is_required: false
}
signal_settings {
type { standard: FACE_FULL }
min_score: 0.8
max_score: 0.85
is_required: false
}
signal_settings {
type: { standard: HUMAN }
min_score: 0.75
max_score: 0.8
is_required: false
}
signal_settings {
type: { standard: PET }
min_score: 0.7
max_score: 0.75
is_required: false
}
signal_settings {
type: { standard: CAR }
min_score: 0.7
max_score: 0.75
is_required: false
}
signal_settings {
type: { standard: OBJECT }
min_score: 0.1
max_score: 0.2
is_required: false
}
}
}
}
# CROPPING: make decisions about how to crop each frame.
node {
calculator: "SceneCroppingCalculator"
input_side_packet: "EXTERNAL_ASPECT_RATIO:aspect_ratio"
input_stream: "VIDEO_FRAMES:input_video"
input_stream: "KEY_FRAMES:video_frames_scaled_downsampled"
input_stream: "DETECTION_FEATURES:salient_regions"
input_stream: "STATIC_FEATURES:borders"
input_stream: "SHOT_BOUNDARIES:shot_change"
output_stream: "CROPPED_FRAMES:output_video"
options: {
[mediapipe.autoflip.SceneCroppingCalculatorOptions.ext]: {
max_scene_size: 600
key_frame_crop_options: {
score_aggregation_type: CONSTANT
}
scene_camera_motion_analyzer_options: {
motion_stabilization_threshold_percent: 0.5
salient_point_bound: 0.499
}
padding_parameters: {
blur_cv_size: 200
overlay_opacity: 0.6
}
target_size_type: MAXIMIZE_TARGET_DIMENSION
#target_width: 720
#target_height: 1124
#target_size_type: USE_TARGET_DIMENSION
}
}
}
Here's the custom runner:
#include <cstdlib>
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/commandlineflags.h"
#include "mediapipe/framework/port/file_helpers.h"
#include "mediapipe/framework/port/opencv_highgui_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_video_inc.h"
#include "mediapipe/framework/port/parse_text_proto.h"
#include "mediapipe/framework/port/status.h"
constexpr char kInputStream[] = "input_video";
constexpr char kOutputStream[] = "output_video";
constexpr char kWindowName[] = "MediaPipe";
DEFINE_string(
calculator_graph_config_file, "",
"Name of file containing text format CalculatorGraphConfig proto.");
DEFINE_string(input_video_path, "",
"Full path of video to load. "
"If not provided, attempt to use a webcam.");
DEFINE_string(output_video_path, "",
"Full path of where to save result (.mp4 only). "
"If not provided, show result in a window.");
DEFINE_string(input_side_packets, "",
"Comma-separated list of key=value pairs specifying side packets "
"for the CalculatorGraph. All values will be treated as the "
"string type even if they represent doubles, floats, etc.");
::mediapipe::Status RunMPPGraph() {
std::string calculator_graph_config_contents;
MP_RETURN_IF_ERROR(mediapipe::file::GetContents(
FLAGS_calculator_graph_config_file, &calculator_graph_config_contents));
LOG(INFO) << "Get calculator graph config contents: "
<< calculator_graph_config_contents;
mediapipe::CalculatorGraphConfig config =
mediapipe::ParseTextProtoOrDie<mediapipe::CalculatorGraphConfig>(
calculator_graph_config_contents);
std::map<std::string, ::mediapipe::Packet> input_side_packets;
if (!FLAGS_input_side_packets.empty()) {
std::vector<std::string> kv_pairs =
absl::StrSplit(FLAGS_input_side_packets, ',');
for (const std::string& kv_pair : kv_pairs) {
std::vector<std::string> name_and_value = absl::StrSplit(kv_pair, '=');
RET_CHECK(name_and_value.size() == 2);
RET_CHECK(
!::mediapipe::ContainsKey(input_side_packets, name_and_value[0]));
input_side_packets[name_and_value[0]] =
::mediapipe::MakePacket<std::string>(name_and_value[1]);
}
}
LOG(INFO) << "Initialize the calculator graph.";
::mediapipe::CalculatorGraph graph;
//'aspect_ratio'
MP_RETURN_IF_ERROR(graph.Initialize(config, input_side_packets));
//LOG(INFO) << "Initialize the calculator graph.";
//mediapipe::CalculatorGraph graph;
//MP_RETURN_IF_ERROR(graph.Initialize(config));
LOG(INFO) << "Initialize the camera or load the video.";
cv::VideoCapture capture;
const bool load_video = !FLAGS_input_video_path.empty();
if (load_video) {
capture.open(FLAGS_input_video_path);
} else {
capture.open(0);
}
RET_CHECK(capture.isOpened());
cv::VideoWriter writer;
const bool save_video = !FLAGS_output_video_path.empty();
if (!save_video) {
cv::namedWindow(kWindowName, /*flags=WINDOW_AUTOSIZE*/ 1);
#if (CV_MAJOR_VERSION >= 3) && (CV_MINOR_VERSION >= 2)
capture.set(cv::CAP_PROP_FRAME_WIDTH, 640);
capture.set(cv::CAP_PROP_FRAME_HEIGHT, 480);
capture.set(cv::CAP_PROP_FPS, 30);
#endif
}
LOG(INFO) << "Start running the calculator graph.";
ASSIGN_OR_RETURN(mediapipe::OutputStreamPoller poller,
graph.AddOutputStreamPoller(kOutputStream));
MP_RETURN_IF_ERROR(graph.StartRun({}));
LOG(INFO) << "Start grabbing and processing frames.";
bool grab_frames = true;
while (grab_frames) {
// Capture opencv camera or video frame.
cv::Mat camera_frame_raw;
capture >> camera_frame_raw;
if (camera_frame_raw.empty()) break; // End of video.
cv::Mat camera_frame;
cv::cvtColor(camera_frame_raw, camera_frame, cv::COLOR_BGR2RGB);
if (!load_video) {
cv::flip(camera_frame, camera_frame, /*flipcode=HORIZONTAL*/ 1);
}
// Wrap Mat into an ImageFrame.
auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
mediapipe::ImageFrame::kDefaultAlignmentBoundary);
cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
camera_frame.copyTo(input_frame_mat);
// Send image packet into the graph.
size_t frame_timestamp_us =
(double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;
MP_RETURN_IF_ERROR(graph.AddPacketToInputStream(
kInputStream, mediapipe::Adopt(input_frame.release())
.At(mediapipe::Timestamp(frame_timestamp_us))));
// Get the graph result packet, or stop if that fails.
mediapipe::Packet packet;
LOG(INFO) << "Polling packet.";
if (!poller.Next(&packet)) {
LOG(INFO) << "No more packets.";
break;
}
LOG(INFO) << "Got packet.";
auto& output_frame = packet.Get<mediapipe::ImageFrame>();
LOG(INFO) << "Got output_frame.";
// Convert back to opencv for display or saving.
cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);
cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
if (save_video) {
if (!writer.isOpened()) {
LOG(INFO) << "Prepare video writer.";
writer.open(FLAGS_output_video_path,
mediapipe::fourcc('a', 'v', 'c', '1'), // .mp4
capture.get(cv::CAP_PROP_FPS), output_frame_mat.size());
RET_CHECK(writer.isOpened());
}
writer.write(output_frame_mat);
} else {
cv::imshow(kWindowName, output_frame_mat);
// Press any key to exit.
const int pressed_key = cv::waitKey(5);
if (pressed_key >= 0 && pressed_key != 255) grab_frames = false;
}
}
LOG(INFO) << "Shutting down.";
if (writer.isOpened()) writer.release();
MP_RETURN_IF_ERROR(graph.CloseInputStream(kInputStream));
return graph.WaitUntilDone();
}
int main(int argc, char** argv) {
google::InitGoogleLogging(argv[0]);
gflags::ParseCommandLineFlags(&argc, &argv, true);
::mediapipe::Status run_status = RunMPPGraph();
if (!run_status.ok()) {
LOG(ERROR) << "Failed to run the graph: " << run_status.message();
return EXIT_FAILURE;
} else {
LOG(INFO) << "Success!";
}
return EXIT_SUCCESS;
}
I managed to get some more information logged by adding GLOG_v=10 GLOG_log_dir=/path/to/logs GLOG_logbuflevel=-1
to the command.
Here's the full command: GLOG_log_dir=/Users/ml/Downloads/mediapipe_glog GLOG_logbuflevel=-1 GLOG_v=10 bazel-bin/mediapipe/examples/desktop/autoflip/run_autoflip --calculator_graph_config_file=mediapipe/examples/desktop/autoflip/autoflip_graph.pbtxt --input_video_path=/path/to/input.mp4 --output_video_path=/path/to/output.mp4 --input_side_packets=aspect_ratio=1:1
(same error if I omit --output_video_path
)
Here's the log file: run_autoflip.machine.local.ml.log.INFO.20200610-133527.73788.txt
The last lines of the log:
I20200610 13:35:28.078006 162881536 scheduler_queue.cc:99] Scheduler queue empty: 1, # of pending tasks: 1
I20200610 13:35:28.078014 160198656 packet.h:739] Using move constructor of mediapipe::Packet with timestamp: 236733709365 and type: mediapipe::autoflip::DetectionSet
I20200610 13:35:28.078035 160198656 calculator_node.cc:830] Called Calculator::Process() for node: [LocalizationToRegionCalculator, LocalizationToRegionCalculator with output stream: object_regions] timestamp: 236733709365
I20200610 13:35:28.078047 160198656 output_stream_manager.cc:167] Output stream: object_regions queue size: 1
I20200610 13:35:28.078054 160198656 output_stream_manager.cc:169] Output stream: object_regions next timestamp: 236733709366
I20200610 13:35:28.078061 160198656 packet.h:629] Using copy constructor of mediapipe::Packet with timestamp: 236733709365 and type: mediapipe::autoflip::DetectionSet
I20200610 13:35:28.078073 160198656 input_stream_manager.cc:155] Input stream:object_regions has added packet at time: 236733709365
I20200610 13:35:28.078083 160198656 packet.h:739] Using move constructor of mediapipe::Packet with timestamp: 236733709365 and type: mediapipe::autoflip::DetectionSet
I20200610 13:35:28.078091 160198656 input_stream_manager.cc:169] Input stream:object_regions becomes non-empty status:1 Size: 1
I20200610 13:35:28.078101 160198656 scheduler_queue.cc:280] Done running [LocalizationToRegionCalculator, LocalizationToRegionCalculator with output stream: object_regions]
I20200610 13:35:28.078110 160198656 scheduler_queue.cc:99] Scheduler queue empty: 1, # of pending tasks: 0
I20200610 13:35:28.078117 160198656 scheduler.cc:455] active queues: 0
It seems to stop after it's done running LocalizationToRegionCalculator
for some reason.
Maybe unrelated, but I also tried to get it running on iOS (based on other examples, but feeding in each frame myself), which gave me a completely different error using the same graph:
2020-06-10 14:47:17.542196+0200 App[74628:3446493] load -> aspectRatio: 1:1
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20200610 14:47:17.544652 175427008 packet.h:739] Using move constructor of mediapipe::Packet with timestamp: Timestamp::Unset() and type: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >
I20200610 14:47:17.545430 175427008 packet.h:629] Using copy constructor of mediapipe::Packet with timestamp: Timestamp::Unset() and type: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >
I20200610 14:47:17.545583 175427008 packet.h:629] Using copy constructor of mediapipe::Packet with timestamp: Timestamp::Unset() and type: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >
I20200610 14:47:17.545800 175427008 packet.h:746] Using move assignment operator of mediapipe::Packet with timestamp: Timestamp::Unset() and type: std::__1::function<void (mediapipe::Packet const&)>
2020-06-10 14:47:17.711878+0200 App[74628:3446764] Starting graph...
I20200610 14:47:17.892958 191234048 validated_graph_config.cc:810] Taking calculator with index 0 in the original order
2020-06-10 14:47:17.895510+0200 App[74628:3446764] Failed to start graph: Error Domain=GoogleUtilStatusErrorDomain Code=2 "; Input Stream "output_video" for node with sorted index 0 does not have a corresponding output stream." UserInfo={NSLocalizedDescription=; Input Stream "output_video" for node with sorted index 0 does not have a corresponding output stream., GUSGoogleUtilStatusErrorKey=<GUSUtilStatusWrapper: 0x6000002d8a50; status = ; Input Stream "output_video" for node with sorted index 0 does not have a corresponding output stream.>}
--
Some help would be highly appreciated 🙏
Hi @ndimension, have you had a chance to take a look at this? Thanks!
Hi @ndimension, @thesofakillers
I am very interesting at this topic. May be you have some updates on it ? Will be appreciated for any information.
Also very interested in a real time autoflip
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.
Any updates on this issue?
From this answer in #471 and from running AutoFlip myself, I see that in general AutoFlip processes an input video in around real-time, i.e. it takes around the length of time of the video to process the video.
As such, in terms of processing time, re-adapting AutoFlip for usage on video streams (for example, usage on a webcam) rather than a pre-existing video should be feasible.
That being said, does the AutoFlip architecture even allow such a task? If I'm not mistaken, there are several steps such as the motion smoothing (i.e. the jitter removal) or the shot boundary detection that require sets of frames as inputs, while typically inference on video streams acts on a single frame basis.
A workaround I'm considering is delaying the incoming stream by a number of frames (seconds even), allowing for chunks of the stream to be inputted as video so to allow the jitter removal, scene detection and other multi-frame calculators to work appropriately. That being said, this may cause issues at chunk boundaries if say they occur mid-scene.
In your experience, is there a minimum length for an input video for AutoFlip to work? Do you have any other suggestions for re-adapting it for usage on video streams? Is it even possible with the given architecture?
Thank you!