Closed creativesh closed 5 years ago
Hi creativesh,
In this framework, melspectrogram is reshaped from (frequency, time) to (frequency, time, filter) so we can use 2d Conv as image. So here a melspectrogram is handled as an grayscale image where frequency and time is considered as height and width (or width and height).
Thank you, so what does mean in each dimension? Why one of dimensions is 1?
On Sun, Jan 13, 2019 at 6:19 PM finejuly notifications@github.com wrote:
Hi creativesh,
In this framework, melspectrogram is reshaped from (frequency, time) to (frequency, time, filter) so we can use 2d Conv as image. So here a melspectrogram is handled as an grayscale image where frequency and time is considered as height and width (or width and height).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/finejuly/dcase2018_task2_cochlearai/issues/1#issuecomment-453835776, or mute the thread https://github.com/notifications/unsubscribe-auth/AVy2RVB3hhrBJR81MYaSfE5wbkKnQ5zXks5vC0dsgaJpZM4Z9C-X .
--
با احترام. خسروانی
Thank you, so what does mean in each dimension? Why one of dimensions is 1? On Sun, Jan 13, 2019 at 6:19 PM finejuly @.***> wrote: Hi creativesh, In this framework, melspectrogram is reshaped from (frequency, time) to (frequency, time, filter) so we can use 2d Conv as image. So here a melspectrogram is handled as an grayscale image where frequency and time is considered as height and width (or width and height). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AVy2RVB3hhrBJR81MYaSfE5wbkKnQ5zXks5vC0dsgaJpZM4Z9C-X . -- با احترام. خسروانی
Each dimension means (batch, frequency, time, filter). If your question is about how to use melspectrogram for 2dConv, you can check our paper or other related studies, including the baseline system for dcase2018 task2 (http://dcase.community/challenge2018/task-general-purpose-audio-tagging, https://arxiv.org/abs/1807.09902).
Hi I have run your model code in utils/model.py and saw that the output ahpe of line 67(melspectrogram) is (?,64,?,1) its confusing,because generally mel spectrograms are for examples 96* 64 ,they are matrices but your output is an image,it has shape 1 in last dimension. can you please explain me ?