gabotechs / MusicGPT

Generate music based on natural language prompts using LLMs running locally
MIT License
672 stars 54 forks source link

generated wav is mapping 1channel to the FL speaker instead of mono #21

Open rubeniskov opened 5 days ago

rubeniskov commented 5 days ago

I have observed that when playing back the waveform in certain audio players, the sound is routed only to the left speaker. This issue seems to occur when the player relies on the speaker channel mapping embedded in the audio file. Despite the audio file being mono, the playback is incorrectly mapped to a single speaker (left) rather than both speakers, resulting in no sound from the right speaker.

ffprobe .\musicgpt-generated.wav
ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg developers
  built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.100 / 61. 19.100
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
Input #0, wav, from '.\musicgpt-generated.wav':
  Duration: 00:00:09.94, bitrate: 1024 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels (FL), flt, 1024 kb/s
diff --git "a/.\\ffprobe-fl.txt" "b/.\\ffprobe-mono.txt"
index 3bdcb7a..a191f71 100644
--- "a/.\\ffprobe-fl.txt"
+++ "b/.\\ffprobe-mono.txt"
@@ -1,4 +1,4 @@
-ffprobe .\musicgpt-generated.wav
+ffprobe .\output_mono.wav
 ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg developers
   built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
   configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
@@ -10,6 +10,8 @@ ffprobe version 7.1-full_build-www.gyan.dev Copyright (c) 2007-2024 the FFmpeg d
   libswscale      8.  3.100 /  8.  3.100
   libswresample   5.  3.100 /  5.  3.100
   libpostproc    58.  3.100 / 58.  3.100
-Input #0, wav, from '.\musicgpt-generated.wav':
+Input #0, wav, from '.\output_mono.wav':
+  Metadata:
+    encoder         : Lavf61.7.100
   Duration: 00:00:09.94, bitrate: 1024 kb/s
-  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels (FL), flt, 1024 kb/s
\ No newline at end of file
+  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 32000 Hz, 1 channels, flt, 1024 kb/s
\ No newline at end of file
rubeniskov commented 5 days ago

It seems the problem came from hound and the assumption of mapping by default the number of channels to a certain speaker I will try to create the issue in the hound repository

https://github.com/ruuda/hound/blob/b5b6fbdd4ca29daa2cc3f8de7c3c57814c0f3207/src/write.rs#L124-L149

image

rubeniskov commented 5 days ago

At least until they fixed and release a new version I think this issue should be keep opened

gabotechs commented 4 days ago

👍 makes sense, thanks for reporting this!

Nurb4000 commented 4 days ago

@gabotechs glad to see you are still active. Thank you for the project and still hoping for that time extension some day to get it over the finish line, so to speak :)

rubeniskov commented 4 days ago

https://github.com/ruuda/hound/pull/88

rubeniskov commented 1 day ago

@gabotechs glad to see you are still active. Thank you for the project and still hoping for that time extension some day to get it over the finish line, so to speak :)

Before implementing such functionality we need a way to handle the onnx export with optimum for audio input models like musicgen-melody, without that we can not create long duration songs, so I opened a issue https://github.com/huggingface/optimum/issues/2095

rubeniskov commented 1 day ago

Fixed!

https://github.com/gabotechs/MusicGPT/pull/24