Closed bwagner closed 4 years ago
Generally, if there are more than one speaker, it's actually a problem with the rev.ai transcription, and not an actual 3rd speaker (which has only happened a few times in the history of the podcast). What would be best is to have unknown speakers listed as "Unknown" in the text (to be dealt with by the editor - 'cuz you never know for sure what happened), and have a stdout message written (to inform the user that this situation has occurred).
Hey Darwin
Here's a new PR https://github.com/darwingrosse/AMT-Transcripts/pull/15 that fixes issue "deal with wrong number of speakers https://github.com/bwagner/AMT-Transcripts/issues/12" It implements the case where more speakers are found in the rev.ai-json than are specified on the command line. Let's assume you specified speakers Darwin and Barry Moon, but the rev.ai contains two more speakers. The result will be: Speaker0 = Darwin Speaker1 = Barry Moon Speaker2 = UNKNOWN_SPEAKER_01 Speaker3 = UNKNOWN_SPEAKER_02
This will be reported on stdout:
json contains more speakers (4) than were provided via -s (2)
This way, if there actually were more speakers in the rev.ai-json than specified on the command line, a simple search/replace will fix the situation, because the script will make sure that once a recognized voice is assigned a particular name e.g. "UNKNOWN_SPEAKER_01", it'll always assign the same identifier to that particular speaker.
Bernhard
On Mon, Dec 23, 2019 at 3:05 PM Darwin Grosse notifications@github.com wrote:
Generally, if there are more than one speaker, it's actually a problem with the rev.ai transcription, and not an actual 3rd speaker (which has only happened a few times in the history of the podcast). What would be best is to have unknown speakers listed as "Unknown" in the text (to be dealt with by the editor - 'cuz you never know for sure what happened), and have a stdout message written (to inform the user that this situation has occurred).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bwagner/AMT-Transcripts/issues/12?email_source=notifications&email_token=AADNESKXMDE3FB5RLP2QKUTQ2DASXA5CNFSM4J6MK5NKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHRGD2A#issuecomment-568484328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADNESIR7342OZZAMFDMQA3Q2DASXANCNFSM4J6MK5NA .
Discuss with Darwin: how to deal with wrong number of speakers.
/Volumes/Sharky/projects/darwin_grosse/AMT-Transcripts/App/transcriptionJsonToHtml.js:94 throw this.inFileName + " contains more speakers (>= " + (SPEAKER_IDX + 1) + ") than were provided via -s (" + this.speakers.length + ")"; ^ ../JSON/transcript-0305.json contains more speakers (>= 3) than were provided via -s (2)