YongyuG / rnnoise_16k

implementation of rnnoise_16k
BSD 3-Clause "New" or "Revised" License
123 stars 40 forks source link

Features extraction #11

Closed wuwenshan closed 3 years ago

wuwenshan commented 3 years ago

Hi guys,

I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running denoise.c, I can't get the proper shape when I run bin2hdf5, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.

Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG

YongyuG commented 3 years ago

Hi guys,

I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running denoise.c, I can't get the proper shape when I run bin2hdf5, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.

Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG

what I used is a Chinese corpus called aishell, pls make sure your data is in 16khz samplerate, wav format, single channel

wuwenshan commented 3 years ago

Thanks for your quick answer

ffprobe version 4.2.4-1ubuntu0.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, wav, from 'clnsp40.wav':
  Duration: 00:00:11.29, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

This is one sample in my clean speech folder, I don't know what I'm doing wrong.

wuwenshan commented 3 years ago

Shout out to @a-rose who found out the issue, it was because my training.h32 contains some printf from the denoise.c, so if you're facing this issue, you should remove all the printf in the denoise.c