Closed coltonoscopy closed 7 years ago
Given that they VTT files are keyless data structures, would it not make more sense to try keep the schema consistent across all files of that type, that is, decide on wether to go ahead and use ID's on all VTT files or none now. The integer IDs holds little value at the moment as they always correlate with the count/order of the items and we don't use them for anything; if that helps influence a decision at all.
Also, if https://github.com/cs50/video50/issues/91 is ever actioned then JSON50 would handle the conversion when generating the episode JSON for each episode (this would drastically decrease the complexity of the application state and eradicate some quite expensive fetching/parsing operations).
We're going to phase out the above-timecode identifiers as unnecessary, since we don't use them for styling, and just having numbers isn't really useful anyway. So just want to ensure the parser doesn't break when our next batch of VTTs lacks identifiers. We'll likely go back and regenerate all existing VTTs soon too once the parser can accommodate.
The changes here https://github.com/cs50/video50/pull/121 means that the parsing function can handle the existing schema:
WEBVTT
1
00:00:00.000 --> 00:01:58.000
This is CS50.
2
00:01:58.000 --> 00:03:37.000
Course Overview
...
and the proposed schema (with no identifiers):
WEBVTT
00:00:00.000 --> 00:01:58.000
This is CS50.
00:01:58.000 --> 00:03:37.000
Course Overview
...
It does this by detecting the length of each entry after splitting by \n
. If the length is equal to 3 [id,time,title] then it uses the old parser, if it is anything but 3 it will use the new parser (expecting a length of two [time,title])
This doesn't sound safe, actually, since the payload of captions can technically contain \n
. Can we instead split on \n\n
and then, for each set of strings in the resulting array, check if the first such string is a time code (as defined by timecode --> timecode
) or not? If it's not, you can then assume that the first such string is a unique identifier.
Sure, that does sound safer. Implemented here https://github.com/cs50/video50/pull/124.
via @dmalan and @dlloyd09
In case we ditch in the future :)