civictechindex / BOP

A repo for tracking work regarding the Brigade Organizer's Playbook
10 stars 6 forks source link

Build pipeline to get otter.ai transcripts and google form spreadsheet into common spreadsheet #49

Closed jkropko closed 3 years ago

jkropko commented 4 years ago

Overview

The google form generates spreadsheet output in which each column matches to a field in the form. We want to also capture the spoken conversation that occurs around each field. The challenge is parsing the otter.ai transcripts and separating the text associated with each field in the Google form, and to do this in a way that can be automatized for all BOP interviews

Dependencies

Otter.ai, google forms, google sheets, Python/pandas

Action Items

Resources/Instructions

jkropko commented 4 years ago

Not sure I totally got the format of a new issue correct. Please feel free to edit!

MasSamH commented 3 years ago

48 related to

ExperimentsInHonesty commented 3 years ago

Update from @jkropko in slack Jon Kropko 11:23 AM Hi Bonnie, hi Thad, what a week! I won’t be able to make it to the meeting today but I’ll send a PR with code for wrangling the transcripts by tomorrow

ExperimentsInHonesty commented 3 years ago

@jkropko can you come to the meeting on saturday?

MasSamH commented 3 years ago

@jkropko making cracking progress on with 'fuzzy name matching' of transcriptions. Working on:

Jon sharing update next week. #49 @ExperimentsInHonesty @thadk @raquelnish @MarianneAnthonette

MasSamH commented 3 years ago

@MarianneAnthonette

Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.

FYI @jkropko @thadk

MarianneAnthonette commented 3 years ago

@ExperimentsInHonesty

Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.

Wouldn't this mean that there should be a field for both length of interview and how long it took the transcription to complete? Can't wrap my head around how just the length of interview will help us quantify time-savings.

MasSamH commented 3 years ago

We will total up all the interview lengths and then divide that by 1/2 and that produces our time saved by using the Jonathan and Thad’s script.

MasSamH commented 3 years ago

@thadk Will add a field for interest in project updates

MasSamH commented 3 years ago

Pair programming went well between Jonathan and Thad. All OtterAI follows naming convention of audio files. Jonathan continuing to build out pipeline for all the interviews. Jonathan sweeping for bugs. Thad adding keywords from interviews into OtterAI to recognise (create new issue) Target date for parsing, cleaning all interviews into spreadsheet Sat 5th.
Manual transcriptions continuing in the meantime.

MasSamH commented 3 years ago

Marianne - Thad has assigned access for OtterAi to Raquel who is transcribing 5 interviews. Raquel and Marianne to prioritize the multi-person interviews. Deep dive into results of transcriptions to get a feel for the analysis.

MarianneAnthonette commented 3 years ago

Expected delivery of draft version: 12/05

MasSamH commented 3 years ago

@jkropko

Update from Jonathan:

Requirements for Jonathan:

Marianne update on some transcriptions that were missing, short, or ones that should or shouldn't be included e.g. Michelle's. Michelle's to include in transcription but not in final analysis. Suggestion to pick some non-brigade leads, e.g. project leads that would give another POV (side-car for the main research and analysis)

MasSamH commented 3 years ago

@thadk to provide an update Thursday on cleaning up the parsed transcription tab.

MasSamH commented 3 years ago

@jkropko Thad finished up the cleaning of the transcription data and are now in the right destination (Transcription sheets) awaiting review from Jon

celeste-hub commented 3 years ago

has cleaned up some of the transcription data Fuzzy name matching applied now to 34 interviews Some manual interventions where transcription data is inaccurate 'e.g. 'cricket' for 'brigade' Ready for new interviews

@jonathan reach out to OtterAI to see if there is a script we can install to detect common mistakes.

@Marianne to huddle with the UXRs (for the thank you) but also show them OtterAI performance to see if they can articulate better.

MasSamH commented 3 years ago

@jkropko shared update:

@jkropko @thadk

@jkropko reach out to OtterAI to see if there is a script we can install to detect common mistakes. Serena has created a new issue to capture the funny quotes for transcription training, to help UXRs to improve annunciation.

MasSamH commented 3 years ago

@thadk @jkropko Would be good to get an update on the items below? Thanks.

Raquel will move them over to the Prioritized blog post meeting.

Baleja commented 3 years ago

All the interviews are currently up to date. scripts are almost integrated. Transcription verification process is defined. Pipeline of moving text from otter.ai export to the google transcript sheet is complete.

remaining action items

thadk commented 3 years ago

Moved to #122 for any meta information about these scripts