AviSynth / AviSynthPlus

AviSynth with improvements
http://avs-plus.net
930 stars 74 forks source link

TimeStretch() - Add support for more than 16 channels #395

Open FranceBB opened 3 weeks ago

FranceBB commented 3 weeks ago

TimeStretch() can work in two modes: mono mode and stereo mode. By default, if it's fed with 1 single channel, then it works in mono mode, while if it's fed with 2 or more channels it works in stereo mode.

For instance, here it's working in mono mode:

ColorBars(848, 480, pixel_type="YV12")

ResampleAudio(48000)
ConvertAudioToFloat()

ch1=GetChannel(1).TimeStretch(96).ResampleAudio(48000)
ch2=GetChannel(2).TimeStretch(96).ResampleAudio(48000)

MergeChannels(ch1, ch2)

while here it's working in stereo mode:

ColorBars(848, 480, pixel_type="YV12")

ResampleAudio(48000)
ConvertAudioToFloat()
TimeStretch(96)
ResampleAudio(48000)

Of course, working in mono mode should be avoided for multi-channel tracks 'cause in that case TimeStretch() won't know that they're actually stereo and therefore won't keep them in phase, thus leading to some potentially bad results. This means that stereo mode is always desired unless you have unrelated tracks.

Now, suppose you have a 5.1 track, then this is gonna work:

ColorBars(848, 480, pixel_type="YV12")

# Creating 6 fake channels
ch12=GetChannels(1,2)
MergeChannels(ch12, ch12, ch12)

ResampleAudio(48000)
ConvertAudioToFloat()
TimeStretch(96)
ResampleAudio(48000)

however under the hood it's actually doing this:

ColorBars(848, 480, pixel_type="YV12")

# Creating 6 fake channels
ch12=GetChannels(1,2)
MergeChannels(ch12, ch12, ch12)

filtered_ch12=GetChannels(1,2).ResampleAudio(48000).ConvertAudioToFloat().TimeStretch(96).ResampleAudio(48000)
filtered_ch34=GetChannels(3,4).ResampleAudio(48000).ConvertAudioToFloat().TimeStretch(96).ResampleAudio(48000)
filtered_ch56=GetChannels(5,6).ResampleAudio(48000).ConvertAudioToFloat().TimeStretch(96).ResampleAudio(48000)

MergeChannels(filtered_ch12, filtered_ch34, filtered_ch56)

This works 'cause it's actually keeping FL FR in phase, CC LFE in phase, LS RS in phase.

This can scale all the way up to 16 channels, in fact this works

ColorBars(848, 480, pixel_type="YV12")

# Creating 16 fake channels
ch12=GetChannels(1,2)
MergeChannels(ch12, ch12, ch12, ch12, ch12, ch12, ch12, ch12)

ResampleAudio(48000)
ConvertAudioToFloat()
TimeStretch(96)
ResampleAudio(48000)

however as soon as I add one more pair, like 18 channels, I get an error:

ColorBars(848, 480, pixel_type="YV12")

# Creating 18 fake channels
ch12=GetChannels(1,2)
MergeChannels(ch12, ch12, ch12, ch12, ch12, ch12, ch12, ch12, ch12)

ResampleAudio(48000)
ConvertAudioToFloat()
TimeStretch(96)
ResampleAudio(48000)

image

So the feature request is: is it possible to support more than 16 channels in TimeStretch()?

This is also being tracked on Doom9 here https://forum.doom9.org/showthread.php?p=2002862 For those reading the topic, the current workaround detailed by Gavino is to basically divide the audio channels in stereo pairs, filter them individually and then recombine them together. The following script can be used as example / guideline:

ColorBars(848, 480, pixel_type="YV12")

ch12=GetChannels(1,2)
#create 18ch
MergeChannels(ch12, ch12, ch12, ch12, ch12, ch12, ch12, ch12, ch12)

m_clip=trim(0, 50)

#Convert audio to 32bit float
ConvertAudioToFloat(m_clip)

#Check the number of audio channels, framerate and frame length of the clip
my_channel_number=(HasAudio)?AudioChannels:0
my_fps=FrameRate()
my_length=FrameCount()

video=last

#Create a blank audio to save all the filtered audios
my_mute_audio = BlankClip(length=my_length, fps=my_fps, audio_rate=48000, channels=0, sample_type="float")
my_mute_video=AudioDub(video, my_mute_audio)

#Divide the audio in stereo pairs and perform pitch adjustment
m_clip = my_mute_video
for (i = 1, my_channel_number-1, 2) {
  GetChannels(video, i, i+1)
  ResampleAudio(48000)
  ConvertAudioToFloat()
  TimeStretch(96)
  ResampleAudio(48000)
  m_clip = MergeChannels(m_clip, last)
} 

#save the results and return the final clip
return m_clip
pinterf commented 3 weeks ago

Actually the maximum number is defined here: https://github.com/AviSynth/AviSynthPlus/blob/master/plugins/TimeStretch/SoundTouch/STTypes.h#L62 I think, this constant can be set to a higher number till the source is within Avisynth project. It conflicts however with PR #378, which assumes that the original soundtouch is used as-is.

qyot27 commented 3 weeks ago

It conflicts however with PR https://github.com/AviSynth/AviSynthPlus/pull/378, which assumes that the original soundtouch is used as-is.

That wouldn't be a big deal really, the issue would simply need to be reported to SoundTouch directly (arguably, if this is a harmless change, IMO it should be reported and fixed there anyway).

My gut feeling is that the limit there is probably just for historical reasons, being that >16 channels is a very recent kind of channel configuration and many legacy programs and audio interfaces (like the dwChannelMask value) just never anticipated something like Dolby Atmos 25 years ago and therefore just have a somewhat arbitrary limit of 16 or 18 or so on.

FranceBB commented 3 weeks ago

Yeah I also think it's probably just an artificial limit. 16 channels was kinda popular as an artificial limit back then 'cause it was the maximum number of individual audio channels you could carry as PCM uncompressed via an SDI cable. I'd vote to just lift the limit and bump it to like 32 or 64, but I can also report it to the SoundTouch devs in the meantime. An alternative would be to only lift the limit for Windows given that it seems to be the only platform that is using the version shipped with Avisynth.

qyot27 commented 3 weeks ago

That's DevIL that currently has the split between Windows and *nix, not SoundTouch. The embedded source of SoundTouch is the only one used at the moment. #378 moves Windows to using the system copy of DevIL, and it removes the embedded SoundTouch sources to be just as consistent...because at present there is no effective difference from upstream and it's just a local copy that keeps hanging around in the source tree.

The problem is that having those in-tree copies puts us at risk of simply papering over things that need to be fixed instead of actually fixing them (or reporting them to the upstream, in the cases where the problem exists in the external library itself). We don't have that long of a list of external dependencies, which only makes the in-tree binary and vendored source that much more glaring.

And there's documentation in the PR covering using external system copies of SoundTouch and DevIL on MSVC. It can be simple (if you just want to use the DevIL SDK package) or it can be significantly more complex (for a fully-static build of DevIL). The doc covers both. SoundTouch is little more than a footnote compared to the section for DevIL and its dependencies.

FranceBB commented 3 weeks ago

Oh, I see! In the meantime, I checked and the limit is indeed also in the official SoundTouch repository with the latest version of SoundTouch: https://codeberg.org/soundtouch/soundtouch/src/branch/master/include/STTypes.h

So I opened the feature request over there as well: https://codeberg.org/soundtouch/soundtouch/issues/38 Let's see. :)