MahmoudAshraf97 / ctc-forced-aligner

Text to speech alignment using CTC forced alignment
150 stars 31 forks source link

Issue getting started with both CLI and Python #5

Closed ocyedwin closed 5 months ago

ocyedwin commented 5 months ago

CLI

Ran into the following error when using CLI:

(tts) ubuntu@sandbox-gpu-worker-instance-1:~/tts-project$ ctc-forced-aligner --audio_path "audio.wav" --text_path "content.txt" --language "eng"
/home/ubuntu/miniconda3/envs/tts/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/tts/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/tts/bin/ctc-forced-aligner", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/site-packages/ctc_forced_aligner/align.py", line 167, in cli
    json.dump(
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 326, in _iterencode_list
    yield from chunks
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/tts/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type float32 is not JSON serializable

audio_out.json seems to run into error when serialising the score:

{
    "text": "Hey, how are you doing today?",
    "segments": [
        {
            "start": 0.08,
            "end": 0.24,
            "text": "Hey,",
            "score": 

Python

For Python:

audio_path = "./audio_out.wav"
text_path = "./content.txt"
language = "eng" # ISO-639-3 Language code

# defaults
window_size = 30
context_size = 2
batch_size = 4

audio_waveform = load_audio(audio_path, model.dtype, model.device)
emissions, stride = generate_emissions(
    model, audio_waveform, window_size, context_size, batch_size
)

How should I initialise the model which is undefined?

Thanks for the help and creating such a useful lib.

MahmoudAshraf97 commented 5 months ago

Hi, the serialization issue should be fixed، as for initializing the model, there's a function called load_alignment_model, I seem to have missed it in the usage documentation

MahmoudAshraf97 commented 5 months ago

Python usage instructions updated

ocyedwin commented 5 months ago

Thanks @MahmoudAshraf97 ! 🙏