Add timestamped-speaker-turns to transcript format.
Resolves #100
Modified SeattleEventScraper to scrape for closed caption file URI
Added a new module WebVTTSRModel that implements SRModel that takes the closed caption file URI and produces transcripts in three formats: raw, timestamped-sentences, and timestamped-speaker-turns
Modified EventGatherPipeline to use a mixture of speech recognition models. That is, if there is a closed caption file URI, use WebVTTSRModel to produce transcripts. Else, use GoogleCloudSRModel to produce transcripts.
Tests
Added a few fake files(fake_caption.vtt, fake_timestamped_sentences.json) to test WebVTTSRModel
Test EventGatherPipeline new mixture of speech recognition models. That is, test for when WebVTTSRModel failed and succeeded in producing transcripts.
Added an example_transcript_speaker_turns.json to be consistent with other example_transcripts.json
Add
timestamped-speaker-turns
to transcript format. Resolves #100Modified
SeattleEventScraper
to scrape for closed caption file URIAdded a new module
WebVTTSRModel
that implementsSRModel
that takes the closed caption file URI and produces transcripts in three formats:raw
,timestamped-sentences
, andtimestamped-speaker-turns
Modified
EventGatherPipeline
to use a mixture of speech recognition models. That is, if there is a closed caption file URI, useWebVTTSRModel
to produce transcripts. Else, useGoogleCloudSRModel
to produce transcripts.Tests
Added a few fake files(fake_caption.vtt, fake_timestamped_sentences.json) to test
WebVTTSRModel
Test
EventGatherPipeline
new mixture of speech recognition models. That is, test for whenWebVTTSRModel
failed and succeeded in producing transcripts.Added an example_transcript_speaker_turns.json to be consistent with other example_transcripts.json