Closed dcorney closed 4 months ago
@dcorney this looks to be because HEALTH_CLAIM_RPOMPT
doesn't ask for the offset as part of the output.
If that's resolved, then this line will need removed and we'll need to correct the value to seconds (rather than ms), at which point I think it will "just work".
I don't think we need to prompt the model to tell us the offset: we split each transcript into chunks and pass one at a time to the model. So as long as keep track of the offset of each chunk and pass that through, we should have what we need already. Having said that, I'm not 100% sure that the youtube_api.py
functions do actually store the offsets as they should do (in get_captions()
and/or `form_chunks()').
OK: so when youtube_api.py
forms chunks, it discards the offsets. I'm going to raise a separate issue & patch that.
Note also that the file you link to above (@JamesMcMinn ) https://github.com/FullFact/health-misinfo-shared/blob/main/src/raphael_backend_flask/vertex.py#L67 should be removed as per #33
Describe the bug
Currently, all URLs of claims found within a video are identical. Instead, each claim should link to the point in the video where the claim is made.
To Reproduce
Steps to reproduce the behaviour:
&t=0
Expected behaviour
Each link URL should end with the timestamp associated with the claim. (Acutally, we should probably link a few seconds earlier).