CouncilDataProject / cdptools_v2

Tools you can use to interact with and run Council Data Project instances.
Other
7 stars 9 forks source link

Automated help markers for transcripts #44

Closed evamaxfield closed 5 years ago

evamaxfield commented 5 years ago

Thanks to [gh-32], timestamp additions to transcripts, transcripts portions can now be "easily" identified as "missing content".

Example using: https://www.seattlechannel.org/FullCouncil/?videoid=x103419

Transcript beginning:

[
   {
      "sentence": "Good afternoon.",
      "start_time": 18.4,
      "end_time": 19.7
   },
   {
      "sentence": "Thanks for being here for me.",
      "start_time": 19.7,
      "end_time": 21.2
   },
   {
      "sentence": "Special city council meeting when I call it right thing as proof is presiding officer.",
      "start_time": 25.7,
      "end_time": 30.3
   }
   ...
]

It is very noticeable that there is a four second gap between the second and third sentences. To me, this would be a good place for a "transcript annotation" that marks this portion as "missing context/ content".

This could be combined with [gh-43] as part of the "TranscriptAnnotationPipeline".