Closed dcorney closed 2 months ago
Currently, we get and store the offset (in seconds) with each bit of text. But when we form chunks of text to pass to an LLM, we discard the offset.
Track the offset of each chunk.
If the chunks are long, the offset might be quite a long way before the claims within the chunk.
I closed this prematurely: the "extra" copies of youtube.py and vertex.py need addressing too. See #33
Overview
Currently, we get and store the offset (in seconds) with each bit of text. But when we form chunks of text to pass to an LLM, we discard the offset.
Requirements
Track the offset of each chunk.
Notes
If the chunks are long, the offset might be quite a long way before the claims within the chunk.