Tfkalk / MultispeakerTranscription

Script to do multispeaker transcription with immediate transcript deletion
0 stars 0 forks source link

Batch Processing is not Robust #8

Open Tfkalk opened 2 months ago

Tfkalk commented 2 months ago

With batch processing, each file is atomic and the code is currently architected to ensure files do not become dependent on each other. However, if one file fails to transcribe, the entire job fails, even if subsequent files could work. This then causes an issue as customers have to remove the already processed audio file from the source directory.

Instead, we should not error on any specific file (if there's a bug inherent to the overall script, we should continue to error out) and allow subsequent files to be processed. However, we do not want failures to get lost in a wall of output so the program should return a list of failures and their cause (some causes, like transcription, will want to be investigated; others like an already existing file may be intentional to avoid moving files around).

Currently unsure whether or not such a failure should cause our 0 exit code to become a non-zero. Technically there were failures so it was not a success, but the program was able to successfully conclude.