Create dataset of subset of Seattle sessions. Only sessions with transcripts created from closed caption conversion are included (min: ~1 hours, max: ~2 hours)
Write function to compare a "ground truth" transcript and a "generated" transcript for WER and word selection / replacement (min: ~8 hours, max: ~16 hours)
Generate speech-to-text versions of all sessions in dataset (active time: ~4 hours)
Dataset should now have: "session_id", "ground_truth_transcript", "speech_to_text_transcript"
Run analysis function across the dataset (min: ~2 hours, max: ~8 hours)
Dataset should now how: "session_id", "ground_truth_transcript", "wer", "replacement_counts"
Analyze these results, whats the overall WER, are there common trends in word replacement? (min: ~16 hours, max: ~40 hours)
Create system for automated benchmarking of these results as a part of continous integration (min: ~16 hours, max: ~24 hours)
Sum Task Time:
min: 1 + 8 + 4 + 2 + 16 + 16 = 47 hours
max: 2 + 16 + 4 + 8 + 40 + 24 = 94 hours
Web Scrapers
Tasks:
Create web scraper for Boston (min ~4 hours, max: ~40 hours) -- legistar
Evaluate Google Speech-to-Text
Tasks:
Sum Task Time:
Web Scrapers
Tasks:
Basically min ~4 hours for any scraper and max ~40 hours for a Legistar scraper missing the video link, max ~80 hours for a non-Legistar scraper
Annotation and Speakerbox
Tasks:
THEN