请问gen_multi_sources_frame_level_data.py文件中使用的apkit是什么？

FYJNEVERFOLLOWS / ResNet-STFT-SSL

ResNet-STFT Model for Sound Source Localization

BSD 3-Clause "New" or "Revised" License

16 stars 5 forks source link

请问gen_multi_sources_frame_level_data.py文件中使用的apkit是什么？ #1

Closed hhhuxy closed 1 year ago

hhhuxy commented 1 year ago

你好，我也在尝试复现相关论文，刚刚接触音频信号部分的知识，比较新手，想问一下语音预处理部分的apkit.stft是自己编写的stft函数吗，参数last_sample=True指什么呀？请问可以指导一下，如果我希望用pytorch的torch.stft实现相同功能的话应当如何写吗？

hhhuxy commented 1 year ago

不好意思打扰你了，现在已经明白了，感谢你的复现的分享。

FYJNEVERFOLLOWS commented 1 year ago

def mulch_stft(waveform, n_fft=2048, hop_length=1024, win_length=2048):
    """
    waveform: [ch, B, t] or [ch, t]
    tf: [ch, B, F, T] or [ch, F, T]
    """
    tf_list = [] 
    for mona_wav in waveform:
        mona_stft = torch.stft(mona_wav, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=torch.hann_window(win_length, device=waveform.device), return_complex=True)
        tf_list.append(mona_stft)

    tf = torch.stack(tf_list)

    return tf