FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

Chunks of YouTube transcripts should include their timestamp offsets #56

Closed dcorney closed 2 months ago

dcorney commented 2 months ago

Overview

Currently, we get and store the offset (in seconds) with each bit of text. But when we form chunks of text to pass to an LLM, we discard the offset.

Requirements

Track the offset of each chunk.

Notes

If the chunks are long, the offset might be quite a long way before the claims within the chunk.

dcorney commented 2 months ago

I closed this prematurely: the "extra" copies of youtube.py and vertex.py need addressing too. See #33