Add rostextsrc - Githubissues

clydemcqueen commented 3 years ago

This is a very early version of rostextsrc. gst-inspect-1.0 and gst-launch-1.0 work, but pipelines don't work as expected.

Design sketch:

rostextsrc subscribes to a topic with std_msgs::msg::String messages
string messages arrive periodically
text buffers are sent to downstream elements like textoverlay to be combined with video buffers
the validity of each text buffer should be [now, forever], that is, each text buffer replaces the previous buffer(s), and is valid until a new text buffer arrives
the video pipelines should work even if a string message never arrives, so there needs to be a default empty text buffer with validity [now, forever]

Design limitations:

there is no timestamp in std_msgs::msg::String, so timestamp will have to be node->now()

Current problems:

I'm not quite sure how to set validity in GST to [now, forever]. The pipeline negotiates well enough, but textoverlay is having trouble sync'ing buffers from rostextsrc and videotestsrc
I don't need a queue, just 1 message
some cleanup required

Possible future work:

it might make sense to have a custom text message type with format caps and a timestamp

Thanks, /Clyde

clydemcqueen commented 3 years ago

GST timestamps are still a bit mysterious to me, but I think I understand how this can work:

Provide video-fps option (same name as subparse) to set the expected video frame rate.

Add a StringStamped message type, and store this in the queue, so we always have a ROS timestamp in the queue. We can still provide an option to subscribe to std_msgs::msg::String type; in this case rostextsrc_sub_cb will call node->now() to get the timestamp and create the required StringStamped message.

rostextsrc_sub_cb adds new messages to the queue, but also discards old messages. A message is old if there is a more recent message where msg.stamp > gst_clock. In this way, the first message in the queue is always the best message to send downstream.

In rostextsrc_create look at the queue, and handle the following conditions:

If the queue is empty send the empty string with PTS=now() and DURATION=1/fps
If the queue has 1 message send q[0].data with PTS=q[0].stamp and DURATION=1/fps. Do not pop the message from the queue.
If the queue has > 1 message then send q[0].data with PTS=q[0].stamp and DURATION=q[1].stamp-q[0].stamp. Do not pop the message from the queue.

This should have these properties:

ROS messages can come in at any rate, including never
video buffers are not blocked at textoverlay
each video frame has the most recent text string

Thoughts?

BrettRD commented 3 years ago

I would like this source to be usable for TTS tasks, so I'd be hesitant to add video frame rate info to the API. rostextsrc ! festival ! wavparse ! audioconvert ! alsasink

For video overlays, perhaps instead of adding video-fps, we might simply leverage subparse and other subtitle formats directly. I think we could target usage patterns like: rostextsrc subtitle="srt" ! subparse ! txt. ... ! textoverlay name=txt ! ... This would allow us to handle string validity durations with subtitle file conventions, and keep the element out of high-speed threads.

It would also allow rosbag string data to be converted to a subtitles file for anyone doing robot assisted videography. rostextsrc subtitle="srt" ! filesink location=ros_string.srt

Some blatant feature creep: can we (much later) add a parameter for a pango-markup template for pretty text? would a string template parameter allow us to express additional subtitle formats without additional code?

clydemcqueen commented 3 years ago

Interesting, I had not thought about those use cases.

I'm still fuzzy on how the timing information (start, duration) is generated, and how it is passed in a ROS message. I see several possible use cases:

the ROS topic has messages of type String.msg with fragments of a srt or ssa file. rostextsrc basically acts like filesrc: the ROS messages are queued when they arrive, and are sent downstream upon request. If there is no message, rostextsrc stalls waiting for a message to arrive. The src pad is ANY. All of the timing data is contained in srt or ssa.

In this case some upstream system generates subtitles with timing information, encodes them as ssa or srt, and publishes them in a sequence of string messages.

the ROS topic contains parsed subtitles with timing information. We could create a new message type like Subtitle.msg that has a string, a format specifier (pango-markup or utf8), a start time (in header.stamp) and a duration. As before, the ROS messages are queued as they arrive and sent downstream upon request. If there is no message, rostextsrc stalls waiting for a message to arrive. The src pad is text/x-raw; format: {pango-markup, utf8}.

In this case some upstream system generates subtitles with timing information and publishes them in a sequence of subtitle messages.

the ROS message contains parsed subtitles in String.msg, and we infer the timing information somehow. As you point out this depends on the application. We could still have some parameters to handle common use cases, but for now perhaps it is best to punt this to a future version.

I suppose the choice of (1) or (2) depends on whether or not you already have srt/ssa-encoded information. In my case I would prefer (2), it sounds like (1) is interesting to you. Both seem straightforward.

My knowledge of gstreamer (and audio, etc.) is still quite weak. Does my analysis make sense? Would you accept case 2 in the repo? I am happy to also work on (1), I am learning a lot about gstreamer in the process, which I find valuable.

Thanks, /Clyde

BrettRD commented 3 years ago

In a later refactor, I'd like to explore better polymorphism in terms of what messages the elements can accept and what pre/post processing can be loaded into the node

For now I think a subtitle message type or a repeat rate property would be fine. I'd lean toward repeat rate and string messages so TTS users can set repeat to zero and subtitle users can set it to the video framerate (excuse my indecision, I've not had time to play with it myself)

BrettRD / ros-gst-bridge

Add rostextsrc #20