BrettRD / ros-gst-bridge

a bidirectional ros to gstreamer bridge and utilities for dynamic pipelines
Other
130 stars 31 forks source link

Add rostextsrc #20

Closed clydemcqueen closed 3 years ago

clydemcqueen commented 3 years ago

This is a very early version of rostextsrc. gst-inspect-1.0 and gst-launch-1.0 work, but pipelines don't work as expected.

Design sketch:

Design limitations:

Current problems:

Possible future work:

Thanks, /Clyde

clydemcqueen commented 3 years ago

GST timestamps are still a bit mysterious to me, but I think I understand how this can work:

Provide video-fps option (same name as subparse) to set the expected video frame rate.

Add a StringStamped message type, and store this in the queue, so we always have a ROS timestamp in the queue. We can still provide an option to subscribe to std_msgs::msg::String type; in this case rostextsrc_sub_cb will call node->now() to get the timestamp and create the required StringStamped message.

rostextsrc_sub_cb adds new messages to the queue, but also discards old messages. A message is old if there is a more recent message where msg.stamp > gst_clock. In this way, the first message in the queue is always the best message to send downstream.

In rostextsrc_create look at the queue, and handle the following conditions:

This should have these properties:

Thoughts?

BrettRD commented 3 years ago

I would like this source to be usable for TTS tasks, so I'd be hesitant to add video frame rate info to the API. rostextsrc ! festival ! wavparse ! audioconvert ! alsasink

For video overlays, perhaps instead of adding video-fps, we might simply leverage subparse and other subtitle formats directly. I think we could target usage patterns like: rostextsrc subtitle="srt" ! subparse ! txt. ... ! textoverlay name=txt ! ... This would allow us to handle string validity durations with subtitle file conventions, and keep the element out of high-speed threads.

It would also allow rosbag string data to be converted to a subtitles file for anyone doing robot assisted videography. rostextsrc subtitle="srt" ! filesink location=ros_string.srt

Some blatant feature creep: can we (much later) add a parameter for a pango-markup template for pretty text? would a string template parameter allow us to express additional subtitle formats without additional code?

clydemcqueen commented 3 years ago

Interesting, I had not thought about those use cases.

I'm still fuzzy on how the timing information (start, duration) is generated, and how it is passed in a ROS message. I see several possible use cases:

  1. the ROS topic has messages of type String.msg with fragments of a srt or ssa file. rostextsrc basically acts like filesrc: the ROS messages are queued when they arrive, and are sent downstream upon request. If there is no message, rostextsrc stalls waiting for a message to arrive. The src pad is ANY. All of the timing data is contained in srt or ssa.

In this case some upstream system generates subtitles with timing information, encodes them as ssa or srt, and publishes them in a sequence of string messages.

  1. the ROS topic contains parsed subtitles with timing information. We could create a new message type like Subtitle.msg that has a string, a format specifier (pango-markup or utf8), a start time (in header.stamp) and a duration. As before, the ROS messages are queued as they arrive and sent downstream upon request. If there is no message, rostextsrc stalls waiting for a message to arrive. The src pad is text/x-raw; format: {pango-markup, utf8}.

In this case some upstream system generates subtitles with timing information and publishes them in a sequence of subtitle messages.

  1. the ROS message contains parsed subtitles in String.msg, and we infer the timing information somehow. As you point out this depends on the application. We could still have some parameters to handle common use cases, but for now perhaps it is best to punt this to a future version.

I suppose the choice of (1) or (2) depends on whether or not you already have srt/ssa-encoded information. In my case I would prefer (2), it sounds like (1) is interesting to you. Both seem straightforward.

My knowledge of gstreamer (and audio, etc.) is still quite weak. Does my analysis make sense? Would you accept case 2 in the repo? I am happy to also work on (1), I am learning a lot about gstreamer in the process, which I find valuable.

Thanks, /Clyde

BrettRD commented 3 years ago

In a later refactor, I'd like to explore better polymorphism in terms of what messages the elements can accept and what pre/post processing can be loaded into the node

For now I think a subtitle message type or a repeat rate property would be fine. I'd lean toward repeat rate and string messages so TTS users can set repeat to zero and subtitle users can set it to the video framerate (excuse my indecision, I've not had time to play with it myself)