Closed dcorney closed 4 months ago
Investigate why the model is returning such large blocks of text - is the prompt wrong? Are we mixing up the sentence text with the chunk text at some point? Do we we need to do some post-processing to work out where in the chunk this particular claim was made?
^^ The prompt we’re currently using doesn’t ask for sentence text, so it’s not included in the response.
Because we don’t have the sentence text, I put the chunk text into raw_sentence_text
instead. That’s not what that field is intended to be used for, though – I think we intend to put sentence text there from the LLM.
Overview
[NB: this is based on the current
dev
branch, which is soon to be merged intomain
]Currently, when the user hovers over an extracted claim, a tooltip pop-up shows an extract of the raw transcript. However, this is a "chunk", corresponding to about 1-2 minutes of the video. This can be quite long, making it hard to find the source of the claim.
Ideally:
Requirements
Update the prompt so that it returns the original sentence along with the inferred claim.
Ideally, this should also be correctly punctuated and with an initial capital letter for readability.
Notes and additional information
Relevant code locations:
process.py
/extract_claims()
gets theraw_sentence_text
returned by the LLM. This is stored ininferred_claims
.templates/video_analysis.html
defineslist-group-item
which showsclaim['raw_sentence_text']
as the tooltip.We might want to consider showing 2-3 sentences if just one doesn't provide enough context.