Closed dcorney closed 4 months ago
This is especially important as we update the prompts/models etc. and may want to compare the same video before and after a change.
^^ Given this, rather than just v1, v2, v3, is it useful to keep track of some extra metadata, e.g.:
The issue here is that we’re keying based on the YouTube ID, rather than our own ID. There would need to be some schema changes in order to store different runs of the same video separately.
The schema currently looks like this:
erDiagram
video_transcripts ||--o{ training_claims : claims
video_transcripts ||--o{ inferred_claims : claims
video_transcripts {
text id PK
text url
text metadata
text transcript
text status
}
training_claims {
integer id PK
text video_id FK
text claim
text label
integer offset_ms
}
inferred_claims {
integer id PK
text video_id FK
text claim
text label
text model
integer offset_ms
}
We’d need to move the current video_transcripts id to another field (e.g. youtube_id), add a new auto-increment ID to video_transcripts, and then point at that from the other tables.
Describe the bug
Currently, if someone enters the video id and clicks analyse for a video that has already been analysed, the model will extract the same (or similar) claims again and add them to the list.
Instead, we should list the video twice (or more times) on the main page, each linking to its associated set of claims.
This is especially important as we update the prompts/models etc. and may want to compare the same video before and after a change.
To Reproduce
Steps to reproduce the behaviour:
Expected behaviour
If the same id occurs twice in the list of analysed videos, distinguish by a version number. E.g.
etc.
Additional context
For reference: in Live, if the same YouTube video is analysed twice, two versions are shown. This is what we want here.