andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

Question about the original audio waveform input #43

Closed luhuijun666 closed 3 years ago

luhuijun666 commented 3 years ago

Hi owen, Thanks for your contributions! In your paper,you said you applied a series of strided 1D convolutions to the input waveform. So the input waveform you refered here (before fusion) is the original audio signal waveform without STFT,right? Why and how you process the 1D signal ? Could you kindly explain this point for me?