5 adds rudimentary support. It only works with the ASS format, it hasn't been extensively tested, but it works.
ASS support should be okay for a lot of purposes, but not perfect. ASS is common for anime fansubs, but SRT is more common for movie subtitle files.
5 takes the approach of displaying the subtitles as a newline-padded code block in a second attachment to the message. Syncing is done by displaying all subtitles that are "close enough" to the current time in the video (as determined by the FPS at which the frames were encoded from ffmpeg).
Following a video is tricky. Following a video without sound is trickier. Following a video at 70 character width ASCII without sound is even harder.
Pre-requisite research: