Claim URLs should include timestamp

FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google

MIT License

0 stars 0 forks source link

Claim URLs should include timestamp #20

Closed dcorney closed 4 months ago

dcorney commented 5 months ago

Describe the bug

Currently, all URLs of claims found within a video are identical. Instead, each claim should link to the point in the video where the claim is made.

To Reproduce

Steps to reproduce the behaviour:

Go to raphael
Hover over (or click on) any of the 5 claims
Note that each ends &t=0

Expected behaviour

Each link URL should end with the timestamp associated with the claim. (Acutally, we should probably link a few seconds earlier).

JamesMcMinn commented 5 months ago

@dcorney this looks to be because HEALTH_CLAIM_RPOMPT doesn't ask for the offset as part of the output.

If that's resolved, then this line will need removed and we'll need to correct the value to seconds (rather than ms), at which point I think it will "just work".

dcorney commented 5 months ago

I don't think we need to prompt the model to tell us the offset: we split each transcript into chunks and pass one at a time to the model. So as long as keep track of the offset of each chunk and pass that through, we should have what we need already. Having said that, I'm not 100% sure that the youtube_api.py functions do actually store the offsets as they should do (in get_captions() and/or `form_chunks()').

dcorney commented 5 months ago

OK: so when youtube_api.py forms chunks, it discards the offsets. I'm going to raise a separate issue & patch that. Note also that the file you link to above (@JamesMcMinn ) https://github.com/FullFact/health-misinfo-shared/blob/main/src/raphael_backend_flask/vertex.py#L67 should be removed as per #33