Closed jkropko closed 3 years ago
Not sure I totally got the format of a new issue correct. Please feel free to edit!
Update from @jkropko in slack Jon Kropko 11:23 AM Hi Bonnie, hi Thad, what a week! I won’t be able to make it to the meeting today but I’ll send a PR with code for wrangling the transcripts by tomorrow
@jkropko can you come to the meeting on saturday?
@jkropko making cracking progress on with 'fuzzy name matching' of transcriptions. Working on:
Jon sharing update next week. #49 @ExperimentsInHonesty @thadk @raquelnish @MarianneAnthonette
@MarianneAnthonette
Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.
FYI @jkropko @thadk
@ExperimentsInHonesty
Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.
Wouldn't this mean that there should be a field for both length of interview and how long it took the transcription to complete? Can't wrap my head around how just the length of interview will help us quantify time-savings.
We will total up all the interview lengths and then divide that by 1/2 and that produces our time saved by using the Jonathan and Thad’s script.
@thadk Will add a field for interest in project updates
Pair programming went well between Jonathan and Thad.
All OtterAI follows naming convention of audio files.
Jonathan continuing to build out pipeline for all the interviews.
Jonathan sweeping for bugs.
Thad adding keywords from interviews into OtterAI to recognise (create new issue)
Target date for parsing, cleaning all interviews into spreadsheet Sat 5th.
Manual transcriptions continuing in the meantime.
Marianne - Thad has assigned access for OtterAi to Raquel who is transcribing 5 interviews. Raquel and Marianne to prioritize the multi-person interviews. Deep dive into results of transcriptions to get a feel for the analysis.
Expected delivery of draft version: 12/05
@jkropko
Update from Jonathan:
Requirements for Jonathan:
Marianne update on some transcriptions that were missing, short, or ones that should or shouldn't be included e.g. Michelle's. Michelle's to include in transcription but not in final analysis. Suggestion to pick some non-brigade leads, e.g. project leads that would give another POV (side-car for the main research and analysis)
@thadk to provide an update Thursday on cleaning up the parsed transcription tab.
@jkropko Thad finished up the cleaning of the transcription data and are now in the right destination (Transcription sheets) awaiting review from Jon
has cleaned up some of the transcription data Fuzzy name matching applied now to 34 interviews Some manual interventions where transcription data is inaccurate 'e.g. 'cricket' for 'brigade' Ready for new interviews
@jonathan reach out to OtterAI to see if there is a script we can install to detect common mistakes.
@Marianne to huddle with the UXRs (for the thank you) but also show them OtterAI performance to see if they can articulate better.
@jkropko shared update:
@jkropko @thadk
@jkropko reach out to OtterAI to see if there is a script we can install to detect common mistakes. Serena has created a new issue to capture the funny quotes for transcription training, to help UXRs to improve annunciation.
@thadk @jkropko Would be good to get an update on the items below? Thanks.
Raquel will move them over to the Prioritized blog post meeting.
All the interviews are currently up to date. scripts are almost integrated. Transcription verification process is defined. Pipeline of moving text from otter.ai export to the google transcript sheet is complete.
remaining action items
Moved to #122 for any meta information about these scripts
Overview
The google form generates spreadsheet output in which each column matches to a field in the form. We want to also capture the spoken conversation that occurs around each field. The challenge is parsing the otter.ai transcripts and separating the text associated with each field in the Google form, and to do this in a way that can be automatized for all BOP interviews
Dependencies
Otter.ai, google forms, google sheets, Python/pandas
Action Items
Resources/Instructions