FYJNEVERFOLLOWS / ResNet-STFT-SSL

ResNet-STFT Model for Sound Source Localization
BSD 3-Clause "New" or "Revised" License
16 stars 5 forks source link

请问gen_multi_sources_frame_level_data.py文件中使用的apkit是什么? #1

Closed hhhuxy closed 1 year ago

hhhuxy commented 1 year ago

你好,我也在尝试复现相关论文,刚刚接触音频信号部分的知识,比较新手,想问一下语音预处理部分的apkit.stft是自己编写的stft函数吗,参数last_sample=True指什么呀?请问可以指导一下,如果我希望用pytorch的torch.stft实现相同功能的话应当如何写吗?

hhhuxy commented 1 year ago

不好意思打扰你了,现在已经明白了,感谢你的复现的分享。

FYJNEVERFOLLOWS commented 1 year ago
def mulch_stft(waveform, n_fft=2048, hop_length=1024, win_length=2048):
    """
    waveform: [ch, B, t] or [ch, t]
    tf: [ch, B, F, T] or [ch, F, T]
    """
    tf_list = [] 
    for mona_wav in waveform:
        mona_stft = torch.stft(mona_wav, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=torch.hann_window(win_length, device=waveform.device), return_complex=True)
        tf_list.append(mona_stft)

    tf = torch.stack(tf_list)

    return tf