ArdenButterfield / stammer

Recreate any audio track by rearranging the frames of another video
MIT License
434 stars 29 forks source link

Take inspiration from YTPs; add a mode that creates sped up and slowed down copies of the carrier before finding matches #38

Open optionboom opened 1 year ago

optionboom commented 1 year ago

In YTPs, it's common to use sped up and slowed down audio/video clips of the source. We could do the same thing, which would add variety to the output, especially for short carriers, and increase how much the output matches the modulator. The benefit would be more noticeable at larger frame-lengths.

The specific process I'm envisioning is that copies of the carrier are made at something like 70%, 200%, and 300% speed, then a new file is created that's the original and the copies all stitched together end-to-end. I assume all that can be done with ffmpeg. After that, the new file is analyzed like normal.

I feel like this is within the scope of the project in a way just pitching up/down frames or vocoding the output wouldn't be, but I don't think it should be default behavior.

ArdenButterfield commented 1 year ago

This is a cool idea! The process you've described sounds very doable within how the code is currently structured. This makes me think about pitch vs speed vs tempo shifting: Do we want to speed up the carrier in a way that changes the pitch or not? I agree that changing the speed without the pitch would make more of a difference on large frame-lengths. I'm not opposed to changing the pitch of frames either— that's certainly within the realm of YTPs as well.

optionboom commented 1 year ago

I was imagining changing the speed in a way that changes the pitch, I think that would be more interesting.

ArdenButterfield commented 1 year ago

That would be very cool. I wonder also if it would be possible to change the matching algorithm to allow changing in pitch. So instead of calculating the distance between modulator frame i and carrier frame j, it would calculate the optimal pitch shift of the carrier and distance at that pitch shift between the two frames.

I have a hunch that there would be more efficient ways to do this than simply making lots of copies of the carrier at different speeds, and that we could get more control over options of pitch shifting. These distance algorithms between spectrograms are still something I want to research more for this project.

optionboom commented 1 year ago

I played around with doing the technique I described manually, and the results were (IMO) pretty mediocre, and of course inefficient.