asticode / go-astisub

Manipulate subtitles in GO (.srt, .ssa/.ass, .stl, .ttml, .vtt (webvtt), teletext, etc.)
MIT License
569 stars 110 forks source link

Optional data after WebVTT file signature isn't respected #110

Open nakkamarra opened 1 month ago

nakkamarra commented 1 month ago

When I read a webvtt file using the ReadFromWebVTT function, do some work, and attempt to write the captions back out to WriteToWebVTT, the optional data after the WebVTT file signature is dropped.

Example:

    captions, err := astisub.ReadFromWebVTT(reader) // reader here represents file
    if err != nil {
        panic(err)
    }
    // ... do some work here to captions
    captions.WriteToWebVTT(w) // writer represents output

file.vtt:

WEBVTT - Some optional comment here

1
00:00:00.500 --> 00:00:02.000
The Web is always changing

2
00:00:02.500 --> 00:00:04.300
and the way we access it is changing

output:

WEBVTT

1
00:00:00.500 --> 00:00:02.000
The Web is always changing

2
00:00:02.500 --> 00:00:04.300
and the way we access it is changing

This isn't a huge deal, it doesn't seem to cause issues with parsing. But I would expect it to work, as it's technically valid according to the spec:

A WebVTT file body consists of the following components, in the following order:

  1. An optional U+FEFF BYTE ORDER MARK (BOM) character.
  2. The string "WEBVTT". 3. Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
  3. Exactly one WebVTT line terminators to terminate the line with the file magic and separate it from the rest of the body.
  4. Zero or more WebVTT metadata headers.
  5. One or more WebVTT line terminators to terminate the header block and separate the cues from the file header.
  6. Zero or more WebVTT cues and WebVTT comments separated from each other by one or more WebVTT line terminators.
  7. Zero or more WebVTT line terminators.
asticode commented 4 weeks ago

You're right, for this to work we would need to add a Comments []string attribute to the Subtitles struct, set it properly when reading the webvtt and write it properly when writing the webvtt.

I'm not planning on adding this anytime soon, but I'm welcoming PRs (for which I can obviously offer guidance) 👍

nakkamarra commented 4 weeks ago

You're right, for this to work we would need to add a Comments []string attribute to the Subtitles struct, set it properly when reading the webvtt and write it properly when writing the webvtt.

I'm not planning on adding this anytime soon, but I'm welcoming PRs (for which I can obviously offer guidance) 👍

Hey thanks for the response @asticode, sure I'll give it a shot and put up PR.