Osmodium / PathfinderTextToSpeechMod

A mod that introduces text to speech in various parts of the game.
MIT License
14 stars 5 forks source link

Baked AI voiceover #26

Open lofcz opened 11 months ago

lofcz commented 11 months ago

Hey, I'm not yet familiar with the codebase. Still, an idea struck me - we can extract all dialogues in plaintext from the localization files, build a dictionary <Actor, VoiceLine>, and generate the VO via ElevenLabs. The current patches need to be modified to lookup the correct VO, we could assign a GUID to every line of every dialogue so the lookup is just a dictionary lookup.

As someone knowledgeable in the topic at hand, does this sound feasible? Are there any major issues with this high-level plan? At least I could fund ElevenLabs API usage, should I have more time I'd be interested in implementing the entire thing myself.

Osmodium commented 11 months ago

Hi! Yes this would be possible for everything that is in the translation files.

The drawbacks that I see with this is:

The positives:

Personally I'm not at fan of introducing any of these things in the mod for the benefit of more natural sounding voices..

lofcz commented 11 months ago

Thanks for the reply, I asked here for the reason you have the required know-how rather than with the intention of pushing said feature here, I agree this is out of the scope of the mod. I gave it a few more thoughts and toyed with sampling a few lines of each actor to gpt3.5-instruct to get characteristics of the actor on the output in a structured format (approximate age, moral alignment, male/female..)

Would you be interested in making a small tech demo that could play one pre-baked VO line somewhere at the start of the game? I'm still unfamiliar with the patches used in the codebase, so this would be a great headstart for me.

Of course, only if this is not too much trouble for you!

Iheuzio commented 9 months ago

ElevenLabs charges based on a limit of characters. Per the amount in the game, it would be too much to voice every line unless you happen to have 5k. Even then, you'll have to manually assign each id to every speech line, it is not something that is very feasible.

curtwagner1984 commented 5 months ago

@Iheuzio What if we crowd source it?

Supposed from the software side we have a working module that does what @lofcz suggested. And this is released as a mod where the user needs to plug in their (paid or free) eleven labs api to generate new lines of speech.

The mod has a database of lines and audio files somewhere online. When a line needs to be said in the game, the mod will first look in the mod's online database. If the audio file exist there, then this is what's played. If it does not, then it requests Eleven Laps to generate the audio and then play it, and then uploaded it to the database. This allows people playing with the mod to benfit from the spoken lines generated by others and contributed to the audio database.