jkropko commented 4 years ago

Overview

The google form generates spreadsheet output in which each column matches to a field in the form. We want to also capture the spoken conversation that occurs around each field. The challenge is parsing the otter.ai transcripts and separating the text associated with each field in the Google form, and to do this in a way that can be automatized for all BOP interviews

Dependencies

Otter.ai, google forms, google sheets, Python/pandas

Action Items

Pull transcripts from otter.ai into a python env
Download the most recent google form spreadsheet
Section the transcript by looking for the common language UXRs use to start a question
Convert the sectioned transcript to a data frame
Merge the data frame with another data frame generated by loading the google form
Format the shared data frame and output CSV

Resources/Instructions

jkropko commented 4 years ago

Not sure I totally got the format of a new issue correct. Please feel free to edit!

MasSamH commented 3 years ago

48 related to

ExperimentsInHonesty commented 3 years ago

Update from @jkropko in slack Jon Kropko 11:23 AM Hi Bonnie, hi Thad, what a week! I won’t be able to make it to the meeting today but I’ll send a PR with code for wrangling the transcripts by tomorrow

ExperimentsInHonesty commented 3 years ago

@jkropko can you come to the meeting on saturday?

MasSamH commented 3 years ago

@jkropko making cracking progress on with 'fuzzy name matching' of transcriptions. Working on:

[x] Using wild loop logic to set the divisions in the transcription
[x] Filtering text only for interviewee
[x] Work on formatting (HTML tag / time stamps)
[x] Set up continual pipeline when transcripts available
[x] Pair-programming taking place next week with Thad

Jon sharing update next week. #49 @ExperimentsInHonesty @thadk @raquelnish @MarianneAnthonette

MasSamH commented 3 years ago

@MarianneAnthonette

[x] Add another field to the spreadsheet to cover length of the interview
[x] Add to all previous templates
[x] Add length of interview

Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.

FYI @jkropko @thadk

MarianneAnthonette commented 3 years ago

@ExperimentsInHonesty

Marianne's estimates that the transcription time will be 50% time saving as a result from Jonathan's script. Intention therefore to quantify time-savings for all conducted interviews at the end of the project.

Wouldn't this mean that there should be a field for both length of interview and how long it took the transcription to complete? Can't wrap my head around how just the length of interview will help us quantify time-savings.

MasSamH commented 3 years ago

We will total up all the interview lengths and then divide that by 1/2 and that produces our time saved by using the Jonathan and Thad’s script.

MasSamH commented 3 years ago

@thadk Will add a field for interest in project updates

MasSamH commented 3 years ago

Pair programming went well between Jonathan and Thad. All OtterAI follows naming convention of audio files. Jonathan continuing to build out pipeline for all the interviews. Jonathan sweeping for bugs. Thad adding keywords from interviews into OtterAI to recognise (create new issue) Target date for parsing, cleaning all interviews into spreadsheet Sat 5th.
Manual transcriptions continuing in the meantime.

MasSamH commented 3 years ago

Marianne - Thad has assigned access for OtterAi to Raquel who is transcribing 5 interviews. Raquel and Marianne to prioritize the multi-person interviews. Deep dive into results of transcriptions to get a feel for the analysis.

MarianneAnthonette commented 3 years ago

Expected delivery of draft version: 12/05

MasSamH commented 3 years ago

@jkropko

Update from Jonathan:

Refocus on the individual interviews
Needs access to metadata that lists the interviewee and interviewer (speakers)
Create a Json object for all interviews
Add two cells to top-level tracking sheet and makes a Json object of all the transcription templates.
Need to have anonymized names in Otterai. Use the script to anonymize with 'find' and 'replace'
For next time add UXR number to avoid naming by having a unique reference
Aiming for 06.12 to get the transcripts run out

Requirements for Jonathan:

Provide to Jonathan - List of all interviewees and all interviewers
Updated text - Thad will copy from OtterAI and transfer into the document that Jonathan is working on

Marianne update on some transcriptions that were missing, short, or ones that should or shouldn't be included e.g. Michelle's. Michelle's to include in transcription but not in final analysis. Suggestion to pick some non-brigade leads, e.g. project leads that would give another POV (side-car for the main research and analysis)

MasSamH commented 3 years ago

@thadk to provide an update Thursday on cleaning up the parsed transcription tab.

MasSamH commented 3 years ago

@jkropko Thad finished up the cleaning of the transcription data and are now in the right destination (Transcription sheets) awaiting review from Jon

celeste-hub commented 3 years ago

has cleaned up some of the transcription data Fuzzy name matching applied now to 34 interviews Some manual interventions where transcription data is inaccurate 'e.g. 'cricket' for 'brigade' Ready for new interviews

@jonathan reach out to OtterAI to see if there is a script we can install to detect common mistakes.

@Marianne to huddle with the UXRs (for the thank you) but also show them OtterAI performance to see if they can articulate better.

MasSamH commented 3 years ago

@jkropko shared update:

Script handles 80% of interview transcription work
20% manual (where code can't identify the text within the interview - hard coding these corrections (these are exceptions).
96 minute time saving from 120 minutes (80% time saving over full manual process)

@jkropko @thadk

[x] Intention to co-locate the script on Github
[ ] Need to have a process to trigger where the manual intervention is needed to make the corrections in the transcription data. (this will save UXRs time reading through the whole transcriptions)
[x] Jonathan and Thad to integrate their scripts (use googlecolab)
[x] All interviews transcribed - ready for Jonathan to run the scripts
[x] Marianne will then create 'issues' for UXRs

@jkropko reach out to OtterAI to see if there is a script we can install to detect common mistakes. Serena has created a new issue to capture the funny quotes for transcription training, to help UXRs to improve annunciation.

MasSamH commented 3 years ago

@thadk @jkropko Would be good to get an update on the items below? Thanks.

[x] Intention to co-locate the script on Github
- [x] Need to have a process to trigger where the manual intervention is needed to make the corrections in the transcription data. (this will save UXRs time reading through the whole transcriptions)
- [x] Jonathan and Thad to integrate their scripts (use googlecolab)
- [x] All interviews transcribed - ready for Jonathan to run the scripts

Raquel will move them over to the Prioritized blog post meeting.

Baleja commented 3 years ago

All the interviews are currently up to date. scripts are almost integrated. Transcription verification process is defined. Pipeline of moving text from otter.ai export to the google transcript sheet is complete.

remaining action items

[x] Finish integrating the scripts
[x] Instructions added to the end of the template OR beginning a new template for the end of the previous
[x] Troubleshoot P&P via #122

thadk commented 3 years ago

Moved to #122 for any meta information about these scripts

civictechindex / BOP

Build pipeline to get otter.ai transcripts and google form spreadsheet into common spreadsheet #49

Overview

Dependencies

Action Items

Resources/Instructions

48 related to