google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.12k stars 5.12k forks source link

Autoflip TEXT signal #2075

Closed engali-ob closed 3 years ago

engali-ob commented 3 years ago

Hello I am trying to use autoflip on some videos containing lowerthird titles and captions with text. I am trying to preserve the text area by padding option in graph signal settings as follow:

signal_settings { type: { standard: TEXT } min_score: 0.9 max_score: 0.95 is_required: true }

It doesnt seem that autoflip recognizes text areas and i get them always cropped in the output. Wonder if the "TEXT" signal type can be used or not implemented yet?

sgowroji commented 3 years ago

Hi @engali-ob, Thank you for reaching us regarding the above issue. Could you please elaborate your use case with complete details. Like any error logs, Screenshot while trying to reproduce the above mentioned error. Thanks!

engali-ob commented 3 years ago

Hi @sgowroji

Thanks for your reply. Actually i am not sure if this could be described as an error since there are no errors at all while processing. Am just checking if text areas detection is part of detections model in autoflip. I added the TEXT signal in graph settings to try since i found it listed in autoflip messages options file

https://github.com/google/mediapipe/blob/master/mediapipe/examples/desktop/autoflip/autoflip_messages.proto

// Stores the message type, including standard types (face, object) and custom // types defined by a string id. // Next tag: 3 message SignalType { enum StandardType { UNSET = 0; // Full face bounding boxed detected. FACE_FULL = 1; // Face landmarks for eyes, nose, chin only. FACE_CORE_LANDMARKS = 2; // All face landmarks (eyes, ears, nose, chin). FACE_ALL_LANDMARKS = 3; // A specific face landmark. FACE_LANDMARK = 4; HUMAN = 5; CAR = 6; PET = 7; OBJECT = 8; MOTION = 9; TEXT = 10; LOGO = 11; USER_HINT = 12; } oneof Signal { StandardType standard = 1; string custom = 2; } }

But the text areas dont seem to be recognized in the output and is normally cropped. I did further investigation and could see the object detection is based on ssdlite_object_detection.tflite model which doesnt seem to include text areas in detection boxes.

So is the "TEXT" signal type is an option to use or yet to be implemented using another tflite detection model?!

Thanks

nathanfrey-ovx commented 3 years ago

Hi @engali-ob,

Last summer we had an intern develop a text detector for autoflip. Can you try to follow the instructions and examples here to see if that meets your needs?

https://github.com/googleinterns/intern-for-design/tree/master/autoflip_integrated_calculators

Please reach back out if you need more details.

-Nathan Frey

sgowroji commented 3 years ago

Hi @engali-ob, Have a look at the above comment. Thanks!

engali-ob commented 3 years ago

Hi @nathanfrey-google

Thanks for directing me there. I can see some nice effort of interns for autoflip detections enhancement. Regarding my concern, I have copied only the text_detection_calculator and recompiled autoflip with graph settings adjustment and the results are pretty convincing for my needs. so the steps as below

proto_library( name = "text_detection_calculator_proto", srcs = ["text_detection_calculator.proto"], deps = [ "//mediapipe/examples/desktop/autoflip/quality:visual_scorer_proto", "//mediapipe/framework:calculator_proto", ], )

mediapipe_cc_proto_library( name = "text_detection_calculator_cc_proto", srcs = ["text_detection_calculator.proto"], cc_deps = [ "//mediapipe/examples/desktop/autoflip/quality:visual_scorer_cc_proto", "//mediapipe/framework:calculator_cc_proto", ], visibility = ["//mediapipe/examples:subpackages"], deps = [":text_detection_calculator_proto"], )

cc_test( name = "text_detection_calculator_test", srcs = ["text_detection_calculator_test.cc"], linkstatic = 1, deps = [ ":text_detection_calculator", ":text_detection_calculator_cc_proto", "//mediapipe/examples/desktop/autoflip:autoflip_messages_cc_proto", "//mediapipe/framework:calculator_framework", "//mediapipe/framework:calculator_runner", "//mediapipe/framework/formats:detection_cc_proto", "//mediapipe/framework/formats:image_frame", "//mediapipe/framework/formats:image_frame_opencv", "//mediapipe/framework/formats:location_data_cc_proto", "//mediapipe/framework/port:gtest_main", "//mediapipe/framework/port:parse_text_proto", "//mediapipe/framework/port:ret_check", "//mediapipe/framework/port:status", "//mediapipe/framework/port:opencv_highgui", "//mediapipe/framework/port:opencv_video", "@com_google_absl//absl/strings", ], )`

# DETECTION: find texts on the down sampled stream node { calculator: "TextDetectionCalculator" input_stream: "VIDEO:video_frames_scaled_downsampled" output_stream: "REGIONS:text_regions" options { [mediapipe.autoflip.TextDetectionCalculatorOptions.ext] { model_path: "mediapipe/models/frozen_east_text_detection.pb" east_width: 160 east_height: 160 } } }

In signal fustion node add the text signal # SIGNAL FUSION: Combine detections (with weights) on each frame node { calculator: "SignalFusingCalculator" input_stream: "shot_change" input_stream: "face_regions" input_stream: "object_regions" input_stream: "text_regions" output_stream: "salient_regions" options { [mediapipe.autoflip.SignalFusingCalculatorOptions.ext] { signal_settings { type { standard: FACE_CORE_LANDMARKS } min_score: 0.85 max_score: 0.9 is_required: false } signal_settings { type { standard: FACE_ALL_LANDMARKS } min_score: 0.8 max_score: 0.85 is_required: false } signal_settings { type { standard: FACE_FULL } min_score: 0.8 max_score: 0.85 is_required: false } signal_settings { type: { standard: HUMAN } min_score: 0.75 max_score: 0.8 is_required: false } signal_settings { type: { standard: PET } min_score: 0.7 max_score: 0.75 is_required: false } signal_settings { type: { standard: CAR } min_score: 0.7 max_score: 0.75 is_required: false } signal_settings { type: { standard: OBJECT } min_score: 0.1 max_score: 0.2 is_required: false } signal_settings { type { standard: TEXT } min_score: 0.85 max_score: 0.9 is_required: true } } } }

Thanks a lot for the help. I may still have a try at other autoflip detection calculators by the interns and give the feedback there.

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No