bwagner / AMT-Transcripts

The Transcription project for the Art + Music + Technology podcast
0 stars 0 forks source link

deal with wrong number of speakers #12

Closed bwagner closed 4 years ago

bwagner commented 4 years ago

Discuss with Darwin: how to deal with wrong number of speakers.

  1. Terminate the program which is what happens now, e.g.
    
    ./tr-parse.js audio transcript-0305.json -s 'Darwin' 'Ed Guild' -r 'December 10, 2013' -o 25.8

/Volumes/Sharky/projects/darwin_grosse/AMT-Transcripts/App/transcriptionJsonToHtml.js:94 throw this.inFileName + " contains more speakers (>= " + (SPEAKER_IDX + 1) + ") than were provided via -s (" + this.speakers.length + ")"; ^ ../JSON/transcript-0305.json contains more speakers (>= 3) than were provided via -s (2)


2. Generate a generic name "Speaker 3", etc. But write it out on stdout
darwingrosse commented 4 years ago

Generally, if there are more than one speaker, it's actually a problem with the rev.ai transcription, and not an actual 3rd speaker (which has only happened a few times in the history of the podcast). What would be best is to have unknown speakers listed as "Unknown" in the text (to be dealt with by the editor - 'cuz you never know for sure what happened), and have a stdout message written (to inform the user that this situation has occurred).

bwagner commented 4 years ago

Hey Darwin

Here's a new PR https://github.com/darwingrosse/AMT-Transcripts/pull/15 that fixes issue "deal with wrong number of speakers https://github.com/bwagner/AMT-Transcripts/issues/12" It implements the case where more speakers are found in the rev.ai-json than are specified on the command line. Let's assume you specified speakers Darwin and Barry Moon, but the rev.ai contains two more speakers. The result will be: Speaker0 = Darwin Speaker1 = Barry Moon Speaker2 = UNKNOWN_SPEAKER_01 Speaker3 = UNKNOWN_SPEAKER_02

This will be reported on stdout:

json contains more speakers (4) than were provided via -s (2)

This way, if there actually were more speakers in the rev.ai-json than specified on the command line, a simple search/replace will fix the situation, because the script will make sure that once a recognized voice is assigned a particular name e.g. "UNKNOWN_SPEAKER_01", it'll always assign the same identifier to that particular speaker.

Bernhard

On Mon, Dec 23, 2019 at 3:05 PM Darwin Grosse notifications@github.com wrote:

Generally, if there are more than one speaker, it's actually a problem with the rev.ai transcription, and not an actual 3rd speaker (which has only happened a few times in the history of the podcast). What would be best is to have unknown speakers listed as "Unknown" in the text (to be dealt with by the editor - 'cuz you never know for sure what happened), and have a stdout message written (to inform the user that this situation has occurred).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bwagner/AMT-Transcripts/issues/12?email_source=notifications&email_token=AADNESKXMDE3FB5RLP2QKUTQ2DASXA5CNFSM4J6MK5NKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHRGD2A#issuecomment-568484328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADNESIR7342OZZAMFDMQA3Q2DASXANCNFSM4J6MK5NA .

bwagner commented 4 years ago

https://github.com/darwingrosse/AMT-Transcripts/pull/15