Closed wuwenshan closed 3 years ago
Hi guys,
I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running
denoise.c
, I can't get the proper shape when I runbin2hdf5
, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG
what I used is a Chinese corpus called aishell, pls make sure your data is in 16khz samplerate, wav format, single channel
Thanks for your quick answer
ffprobe version 4.2.4-1ubuntu0.1 Copyright (c) 2007-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, wav, from 'clnsp40.wav':
Duration: 00:00:11.29, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
This is one sample in my clean speech folder, I don't know what I'm doing wrong.
Shout out to @a-rose who found out the issue, it was because my training.h32
contains some printf
from the denoise.c
, so if you're facing this issue, you should remove all the printf
in the denoise.c
Hi guys,
I would like to know if someone faced some issues about the first step of features extraction with the denoise file, I'm trying to retrain the model with 16khz audio but it looks like after running
denoise.c
, I can't get the proper shape when I runbin2hdf5
, I even checked the values of the training.f32 file and I got high values or NaN. I didn't modify the code except for the count, I fixed it to 500000, I used Microsoft Dataset available here : https://github.com/microsoft/MS-SNSD.Maybe my datasets didn't fit with the code, can I ask you what datasets did you use for training your model, it would help me a lot @YongyuG