NibbleRealm / twang

Library for pure Rust advanced audio synthesis.
https://nibblerealm.com/twang/
Apache License 2.0
125 stars 8 forks source link

Implement Bandpass Filter #15

Open AldaronLau opened 2 years ago

AldaronLau commented 2 years ago

Including high-pass and low-pass filters

m4b commented 1 year ago

Is there any chance of this happening? I was hoping to mess around with formant synthesis and since I used twang before wanted to experiment using it, but my understanding is formants require a band pass filter. Alternatively, if you have an idea how to make it happen, if it isn't a large api change or something, maybe some guidance :) ?

AldaronLau commented 1 year ago

One resource I found was https://www.soundonsound.com/techniques/formant-synthesis, I'm not sure if it's enough to go off of for an implementation in Twang or not. My work on this project has been around API improvements recently, so I haven't had enough time to look into it as much as I would like. Sadly, my understanding in this theory is currently very limited which is why I marked this issue as "help wanted". It shouldn't require a large API change, just an API addition.

For personal projects I require this feature, so it's will become the highest priority after the API improvements; I will put in the time to figure it out eventually, if I don't get help on it first. So, I'm sorry, but I don't know if I can provide much guidance at the moment.

m4b commented 1 year ago

Cool, thanks for such a quick response! Yes that link was what I was looking at, and the wikipedia article is also quite good (though I too do not know if it is sufficient to build an implementation). Sadly, it looks like this area (formant synthesis and general text to speech) has a surprising lack of documentation on internals, or has become overcome with web apis where you send a request and get a wav back (amazon, google, etc.) but alas this does not help me learn how any of it works at the implementation level :) mostly I was hoping for a simple pure rust version of espeak (actually I was hoping for a pure rust pipeline to go from IPA -> synthesized speech) but there's nothing at all in this space on crates as far as I can see.

In any event, that led me to wanting to develop a very basic implementation of formant synthesis, and as I saw both articles noted use of BPF, and I saw this issue in twang, I thought I'd ask here about status. And thanks again for building all these tools, they're quite good and fun :)

AldaronLau commented 1 year ago

I was hoping for a pure rust pipeline to go from IPA -> synthesized speech

This is similar to a project I want to make for an open-source vocaloid clone, which is one reason I will be prioritizing this in the future.

Sadly, it looks like this area (formant synthesis and general text to speech) has a surprising lack of documentation on internals

I think it will likely require reverse-engineering a random electrical diagram of a synthesizer to Rust code, which is often times how I have been approaching this project. I'm also trying to write up documentation for how all of this works too as I figure it out, in the new Twang book, so hopefully that won't be true in the future.

And thanks again for building all these tools, they're quite good and fun :)

I'm glad you're enjoying them, they're fun to make!

m4b commented 1 year ago

This is similar to a project I want to make for an open-source vocaloid clone, which is one reason I will be prioritizing this in the future.

Ah I see, I thought you were saying that a BPF was needed for a project you were working on which required it, did not know you meant you were interested in TTS or formant synthesis, this is great! Please keep me apprised or feel free to ping me/email me whatever, on any updates or stuff you end up pushing, would definitely be interested in collaboration there :)

Btw, just googling vocaloid, it looks like they use the alternative approach to voice synthesis, which is: https://en.wikipedia.org/wiki/Concatenative_synthesis

This article gives a good overview (but is sadly lacking in most implementation specfiics): https://en.wikipedia.org/wiki/Speech_synthesis

And just to be clear, I'm not interested in the (english) textual parsing component (e.g., parsing and selecting pronununciation for "i live at a live park", then mapping to phonemes) of TTS (which is a whole project in and of itself!), which is why I was more interested in an IPA -> waveform pipeline.

AldaronLau commented 1 year ago

And just to be clear, I'm not interested in the (english) textual parsing component (e.g., parsing and selecting pronununciation for "i live at a live park", then mapping to phonemes) of TTS (which is a whole project in and of itself!), which is why I was more interested in an IPA -> waveform pipeline.

This is not something I'm interested in either, and won't be included as part of my "vocaloid" clone. Mostly because how something is pronounced varies by region or even by individual, so for the most "accurate" voice synthesis, the pronunciation would need to be selected manually per voice/character anyway.

Btw, just googling vocaloid, it looks like they use the alternative approach to voice synthesis, which is: https://en.wikipedia.org/wiki/Concatenative_synthesis

While I am planning on implementing concatenative synthesis in the engine, I am additionally planning on including formant synthesis. If I remember right, there is at least one vocaloid/utauloid that uses concatenative synthesis on samples that were created with formant synthesis, but I can't remember which one at the moment.

Please keep me apprised or feel free to ping me/email me whatever, on any updates or stuff you end up pushing, would definitely be interested in collaboration there :)

I'll try to keep you updated!

AldaronLau commented 1 year ago

I can't remember which one at the moment

If you're interested, voice synthesis of Uta Utane / Defoko almost certainly uses formant synthesis.

m4b commented 1 year ago

Interesting. The voice synthesis samples are quite impressive! I feel like if I descend into japanese internet subcultures I may not ever come back :D

More seriously, it's a little sad there aren't more open source/permissively licensed software i this space with even basic implementations, feels like a nice area for improvement :)