One problem with keeping track of the running sentence text is: when we discard everything but the last 500 characters (in order to form overlapping blocks) we lose timestamp offset information.
Here, I’ve changed this approach, to ensure we keep timestamp offset information with the sentence text.
One thing that is potentially weird here is: the final chunk will not have an end offset, because we have no way of knowing that (the end of the final chunk is the end of the video. So if we knew the length of the video, we could set it to that.) I’ve set it to None here, but we’ll need to amend other code to deal with this.
Fixes #86.
One problem with keeping track of the running sentence text is: when we discard everything but the last 500 characters (in order to form overlapping blocks) we lose timestamp offset information.
Here, I’ve changed this approach, to ensure we keep timestamp offset information with the sentence text.
One thing that is potentially weird here is: the final chunk will not have an end offset, because we have no way of knowing that (the end of the final chunk is the end of the video. So if we knew the length of the video, we could set it to that.) I’ve set it to
None
here, but we’ll need to amend other code to deal with this.Pull request checklist
main