joultram commented 3 years ago

I'm trying to make a Mycroft/Picroft respond in a voice like the classic BBC Dr Who baddie, a Dalek.

I started with the standard British male Mimic diphone voice, it's already pretty robotic so it's well suited. For those who may be interested, I've altered it so that it does a passable Dalek impression which has involved two main steps;

The first is to break up the response into the individually delivered words (as in 'you ... will ... be ... exterminated') rather than running words together as in human speech. To do this on Mycroft I've interrupted coding at the point that the response has been translated into text (/mycroft-core/mycroft/audio/speech.py, at 'def handle_speak(event):') and changed the code at the 'else' point. Before I show any coding, I should say that, while I've been coding for many years, I'm a complete newbie to Python (and Mycroft/Picroft) and if I'm treading on toes or infringing things please let me know or delete this, and if you copy any of this you do so at your own risk (always make copies of the original files so that you can get back to the original code). This is what I changed it to; else:

insert pauses ('. ') between words for that dalek sound

            utterance = utterance.replace(" ",". ")
            utterance = utterance.replace(",",". . ")
            utterance = utterance + ". "
            mute_and_speak(utterance, ident, listen)

The second step was to add the Dalek electronic twang to the voice. After extensive Googling I found that this was originally created by passing the actor's voice through a 'ring modulator'(?). On another site (which I can't find at the moment, but the author deserves much the credit for this bit) I found that a 'software only' approximation of ring modulation was to merge a sine wave with the original voice. A sawtooth wave is a decent approximation of a sine wave and, I thought, might be faster so I chose that instead. Mycroft was reluctant to me adding the coding as a separate module so, again, I've had to butcher the original code, in this case '/mycroft-core/mycroft.tts/tts.py' at 'def _execute(self, sentence, ident, listen):'. The code was changed (at the point shown) to;

if os.path.exists(wav_file): LOG.debug("TTS cache hit") phonemes = self.load_phonemes(key) else: wav_file, phonemes = self.get_tts(sentence, wav_file) if phonemes: self.save_phonemes(key, phonemes) vis = self.viseme(phonemes) if phonemes else None try: tooth_w = 0.01 tooth_h = 0.0 ifile = wave.open(wav_file,'rb') channels = ifile.getnchannels() frames = ifile.getnframes() width = ifile.getsampwidth() rate = ifile.getframerate() audio = ifile.readframes(frames)

remove the original file

           ifile.close()
           os.remove(wav_file)
           # Convert buffer int16 using NumPy                                                                                 
           audio16 = numpy.frombuffer(audio, dtype=numpy.int16)
           empty16 = ([])
           h = 1
           d = tooth_w
           for x in audio16:
               n=x*h
               empty16.append(n)
               h = h - d
               if h > 1 or h < tooth_h: 
                   d = d * -1
           outarray = numpy.array(empty16, dtype=numpy.int16)
           dalek_file = wave.open(wav_file,'wb')
           dalek_file.setnchannels(channels)
           dalek_file.setframerate(rate)
           dalek_file.setnframes(frames)
           dalek_file.setsampwidth(width)
           dalek_file.writeframes(outarray)
           dalek_file.close()
        except Exception as e:
           print(e)         
           print("NOT dalekified")
        finally:
           self.queue.put((self.audio_ext, wav_file, vis, ident, l))

I also had to import the needed modules.

The tooth_h and tooth_w variables are the height and width of the sawtooth. I normally set tooth_h to 0, this means the sawtooth goes back and forth between 1 and 0 and the value deducted or added at each step is given by tooth_w (this should be between 0 and 1, preferably low) and the change in effect can be dramatic. There are hours of fun to be had messing about with tooth_w, there is a balance to be found between making it more 'Dalek' but keeping it intelligible.

My problem is that adding the coding at this point involves reopening the .wav file getting all the frames and precessing each, then rebuilding the file. This adds a 'noticeable' (read irritating) delay to the response, probably at least doubling the original noticeable response delay. My understanding of diphone voices are that they are created by concatenating tiny speech sounds held in some sort of database held in the original flitevox voice file. What would make it much faster would be to sawtooth each of these tiny fragments and return them to the file so that the Dalek voice was built in. Since each sawtooth fragment would be the same size as the original this shouldn't be a problem, if I could get at them. so my question is, is there an easy way to do this, or a complete description of the structure of a diphone file somewhere, or some kindly genius out there who could help? Cheers

iUltimateLP commented 3 years ago

While I can't help you, this looks pretty cool. Any example on how this sounds? Thanks!

joultram commented 3 years ago

Hi Johnny

No problem, I was intending to put a couple of samples with the post but couldn't figure out how to do it. The dave_human file is the original and the dave_dalek files are after coding with the number at the end being the tooth_w variable used (you'll see that makes quite a difference!).

Enjoy

Cheers

John

On Tue, 20 Apr 2021, 08:43 Johnny, @.***> wrote:

While I can't help you, this looks pretty cool. Any example on how this sounds? Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/festvox/flite/issues/62#issuecomment-823055141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLCJLT6RUFMIKW3FTVO4K3TJUWCRANCNFSM425EXOXA .

iUltimateLP commented 3 years ago

Seems like the files didn't make it in the mail, can you upload them to drive or similar? Thanks!

joultram commented 3 years ago

Sorry, just sent the mail to someone else and they got it fine, must be at your end.

Cheers

John

On Tue, 20 Apr 2021 at 12:48, Johnny @.***> wrote:

Seems like the files didn't make it in the mail, can you upload them to drive or similar? Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/festvox/flite/issues/62#issuecomment-823209073, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLCJLVRLOK7X4XFTWLQZLLTJVSZLANCNFSM425EXOXA .

iUltimateLP commented 3 years ago

You're replying to the GitHub mail, which only shows the text here on GitHub as an issue, but attachments get lost unfortunately.

joultram commented 3 years ago

D'Oh!

Sorry, didn't realise. I've set up a drive folder on the link below. Cheers

https://drive.google.com/folderview?id=1us19J80QNmXoHlh7meB_R4wXzguuxBFc

festvox / flite

Dalek TTS voice on Picroft - diphone file structure #62

insert pauses ('. ') between words for that dalek sound

remove the original file