davidmartinrius / speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
MIT License
170 stars 17 forks source link

Multiple issues with container environemnt and virtual environment. #12

Closed sweetbbak closed 2 months ago

sweetbbak commented 2 months ago

I have been trying to get this to work for a couple of days at this point and I am not having any luck. Im having issues with both using a venv and with containers.

First off, with a venv I use python3.10 and run pip install, then I try to run the module with the given parameters:

python3.10 -m speech_dataset_generator.main --input_file_path ./output.mp4 --output_directory output --range_times 4-10 --datasets metavoice

and get the error:

Traceback (most recent call last):
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/utils/embedding_functions.py", line 395, in __init__
    self.ort = importlib.import_module("onnxruntime")
  File "/nix/store/k6nhszar815iizin6vmdya84lhd8v8q7-python3-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 57, in <module>
    raise import_capi_exception
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import ExecutionMode  # noqa: F401
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 32, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nix/store/k6nhszar815iizin6vmdya84lhd8v8q7-python3-3.10.14/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/k6nhszar815iizin6vmdya84lhd8v8q7-python3-3.10.14/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/sweet/repos/speech-dataset-generator/speech_dataset_generator/main.py", line 3, in <module>
    from speech_dataset_generator.audio_processor.audio_processor import process_audio_files, get_local_audio_files, get_youtube_audio_files, get_librivox_audio_files, get_tedtalks_audio_files
  File "/home/sweet/repos/speech-dataset-generator/speech_dataset_generator/audio_processor/audio_processor.py", line 3, in <module>
    import chromadb
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/__init__.py", line 3, in <module>
    from chromadb.api.client import Client as ClientCreator
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/api/__init__.py", line 7, in <module>
    from chromadb.api.models.Collection import Collection
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 57, in <module>
    class Collection(BaseModel):
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 74, in Collection
    ] = ef.DefaultEmbeddingFunction(),  # type: ignore
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/utils/embedding_functions.py", line 566, in DefaultEmbeddingFunction
    return ONNXMiniLM_L6_V2()
  File "/home/sweet/repos/speech-dataset-generator/venv/lib/python3.10/site-packages/chromadb/utils/embedding_functions.py", line 397, in __init__
    raise ValueError(
ValueError: The onnxruntime python package is not installed. Please install it with `pip install onnxruntime`

but I know for a fact onnxruntime IS installed.

~$ pip install onnxruntime                                                                                                          
Requirement already satisfied: onnxruntime in ./venv/lib/python3.10/site-packages (1.17.3)
Requirement already satisfied: coloredlogs in ./venv/lib/python3.10/site-packages (from onnxruntime) (15.0.1)
Requirement already satisfied: flatbuffers in ./venv/lib/python3.10/site-packages (from onnxruntime) (24.3.25)
Requirement already satisfied: numpy>=1.21.6 in /nix/store/k7ln657g7h2rynsss685jc4a6dkpdzlk-python3.10-numpy-1.26.4/lib/python3.10/site-packages (from onnxruntime) (1.26.4)
Requirement already satisfied: packaging in ./venv/lib/python3.10/site-packages (from onnxruntime) (23.2)
Requirement already satisfied: protobuf in ./venv/lib/python3.10/site-packages (from onnxruntime) (4.25.3)
Requirement already satisfied: sympy in ./venv/lib/python3.10/site-packages (from onnxruntime) (1.12)
Requirement already satisfied: humanfriendly>=9.1 in ./venv/lib/python3.10/site-packages (from coloredlogs->onnxruntime) (10.0)
Requirement already satisfied: mpmath>=0.19 in ./venv/lib/python3.10/site-packages (from sympy->onnxruntime) (1.3.0)

In the case that I CAN get it to work, I still run into this error:

Processing: ./output.mp4
removing silences

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
shutil.py 816 move
os.rename(src, real_dst)

FileNotFoundError:
2
No such file or directory
/home/sweet/repos/speech-dataset-generator/.tmp/e6fce5bf-6b4a-4016-bc11-c8cacf71d402/out_final.mp4
/home/sweet/repos/speech-dataset-generator/output/enhanced_audios/output_enhanced.mp4

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
runpy.py 196 _run_module_as_main
return _run_code(code, main_globals, None,

runpy.py 86 _run_code
exec(code, run_globals)

main.py 59 <module>
process_audio_files(audio_files, output_directory, start, end, enhancers, datasets)

audio_processor.py 144 process_audio_files
dataset_generator.process(audio_file, output_directory, start, end, enhancers, collection, datasets)

dataset_generator.py 432 process
enhanced_audio_file_path = self.audio_manager_instance.process(path_to_audio_file, output_directory, enhancers)

audio_manager.py 24 process
self.enhance_audio(input_audio, output_audio_file, enhancers)

audio_manager.py 63 enhance_audio
self.remove_sliences(temp_output, output_audio_file)

audio_manager.py 203 remove_sliences
u.render_media(output_audio_file, audio_only=True)  # Audio only specified

Unsilence.py 105 render_media
renderer.render(self.__input_file, output_file, self.__intervals, **kwargs)

MediaRenderer.py 138 render
shutil.move(final_output, output_file)

shutil.py 836 move
copy_function(src, real_dst)

shutil.py 434 copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)

shutil.py 254 copyfile
with open(src, 'rb') as fsrc:

FileNotFoundError:
2
No such file or directory
/home/sweet/repos/speech-dataset-generator/.tmp/e6fce5bf-6b4a-4016-bc11-c8cacf71d402/out_final.mp4

So then I spent some hours trying to get this project to work in a container

podman build --tag speech -f Dockerfile

podman run -v ./output.mp4:/app/output.mp4 --privileged --memory=33g --shm-size=3g --gpus=all -e NVIDIA_VISIBLE_DEVICES=all localhost/speech:latest --input_file_path /app/output.mp4 --output_directory output --range_times 4-10 --datasets metavoice

and get the following error:

Loaded  speechmetrics.absolute.mosnet
Loaded  speechmetrics.absolute.srmr

-------------------------------------------------------------------------------
runpy.py 196 _run_module_as_main
return _run_code(code, main_globals, None,

runpy.py 86 _run_code
exec(code, run_globals)

main.py 59 <module>
process_audio_files(audio_files, output_directory, start, end, enhancers, datasets)

audio_processor.py 144 process_audio_files
dataset_generator.process(audio_file, output_directory, start, end, enhancers, collection, datasets)

dataset_generator.py 432 process
enhanced_audio_file_path = self.audio_manager_instance.process(path_to_audio_file, output_directory, enhancers)

audio_manager.py 26 process
if not self.has_speech_quality(output_audio_file):

audio_manager.py 216 has_speech_quality
scores = metrics(path_to_audio_file)

__init__.py 117 __call__
result_metric = metric.test(*files, array_rate=rate)

__init__.py 46 test
audio, rate = sf.read(file, always_2d=True)

soundfile.py 285 read
with SoundFile(file, 'r', samplerate, channels,

soundfile.py 658 __init__
self._file = self._open(file, mode_int, closefd)

soundfile.py 1216 _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))

soundfile.LibsndfileError:
1
Error opening 'output/enhanced_audios/output_enhanced.mp4': 

I don't know what else I can do, nothing works.

sweetbbak commented 2 months ago

Im not sure how, but I finally got things to at the very least run, but now I am running into another error. It seems like however I encode my audio ripped from youtube, it complains about the quality not being high enough

Discarding audio /home/sweet/repos/speech-dataset-generator/output/enhanced_audios/output_enhance
d.mp3. Not enough quality. MOS 2.9146027314035514 < 3

I've tried OPUS, Wav, mp3, aac etc...

davidmartinrius commented 2 months ago

Im not sure how, but I finally got things to at the very least run, but now I am running into another error. It seems like however I encode my audio ripped from youtube, it complains about the quality not being high enough

Discarding audio /home/sweet/repos/speech-dataset-generator/output/enhanced_audios/output_enhance
d.mp3. Not enough quality. MOS 2.9146027314035514 < 3

I've tried OPUS, Wav, mp3, aac etc...

It's not a bug, it's a feature and it is explained here

You can modify the code yourself, as it is open source