FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

56 (take 2) chunks of youtube transcripts should include their timestamp offsets #68

Closed dcorney closed 2 months ago

dcorney commented 2 months ago

Fixes #56 .

Stores the start & end offset for each chunk of a transcript

DON'T MERGE YET! The corresponding change needs to be made to the backend app.py as claims now have fields offset_s and offset_end_s.

Do we want to define these in seconds or milliseconds? (Seconds seems fine-grained enough to me but open to other opinions).

And either way, we need to update the database schema to store offset end.


Pull request checklist

andylolz commented 2 months ago

Do we want to define these in seconds or milliseconds? (Seconds seems fine-grained enough to me but open to other opinions).

I tend to agree – but switching to seconds would mean renaming the database field!

Presumably we’re storing milliseconds so we can use an integer field? Either way, it’s okay to return seconds here, and convert it to milliseconds for database storage.

andylolz commented 2 months ago

I realised you were right all along, @dcorney – the seconds to milliseconds change did introduce a breaking change here.

I’ve made a very small fix for it (which will presumably be replaced with something better soon) and I think this is now mergeable as is.