bshall / knn-vc

Voice Conversion With Just Nearest Neighbors
https://bshall.github.io/knn-vc/
Other
431 stars 64 forks source link

Considering context around source features #19

Closed wilson97 closed 11 months ago

wilson97 commented 1 year ago

Hi,

I had an idea, wanted to run it by you. So right now, for each source feature, you are doing k-means with the reference features. I'm thinking that the surrounding source features also might have useful information that could help you better nail down the correct reference feature.

So for example my source features are [s1 s2 ... s100] and reference features (lets just assume k = 1) are [r1 r2 ... r100]. If you consider the sources features by themselves, maybe s1 maps to r22 and s2 maps to r77. But if you were to consider s1 and s2 together, they combined would map to [r23, r24] which is more correct.

Let me know what you think about this. Does this make sense/is my scenario plausible?

Thank you.

bshall commented 1 year ago

Hi @wilson97, we also thought that additional context would be useful and tried two different methods to incorporate information from neighboring features. The first method we tried was similar to your proposal, where we used a weighted sum of the distances between triplets of frames. For the second method we adapted the dynamic programming algorithm for unit selection (from concatenative synthesis) where you encourage smoothness between the selected frames. Unfortunately, neither of them were significantly different from the naive kNN approach, so we went with that for simplicity.

However, there is definitely room for experimentation! I think incorporating smoothness constraints or neighboring context could help get rid of some of the boundary artefacts we sometimes get. Let me know if you find something that works well.

Hope that helps!

lenzo-ka commented 1 year ago

The classic was of incorporating trajectory in HMM/GMM models was to use deltas and delta-deltas, typically only from the previous frame. That might be another approach worth tinkering with