CoEDL / elpis

🙊 software for creating speech recognition models.
https://elpis.readthedocs.io/en/latest/
Apache License 2.0
152 stars 33 forks source link

HFT audio minimum-maximum filter not working #327

Closed ronny3 closed 1 year ago

ronny3 commented 1 year ago

Lines 597-598 on commit 1304d05 in hft//model.py

This filter does nothing in my container. This will cause a key error as later the program (:505) is trying to acquire from the speech dict something that was not added to it, as the duration was not between min max as decided by the if-clause on :569.

I fixed this with temporary list by changing the filter to lambda x: x["start_ms"] in temp_start and x["end_ms"] in temp_end

benfoley commented 1 year ago

Hi, Can you please submit a PR so we can see your temporary fix in context?

ronny3 commented 1 year ago

Hi, Can you please submit a PR so we can see your temporary fix in context?

Hi. I'm not really a dev, so I don't know how to PR a container...

But here is the relevant code

model.py:542

def prepare_speech(self):
    logger.info("==== Preparing Speech ====")
    speech = {}
    **temp_start = []**
    **temp_end = []**
    audio_paths = set()
    rejected_count = 0

    for utt in self.hft_dataset["train"]:
        audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))

    for utt in self.hft_dataset["dev"]:
        audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))

    for utt in self.hft_dataset["test"]:
        audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))

    for path, text, start_ms, stop_ms in audio_paths:
        audio_metadata = torchaudio.info(path)

        start_frame = int(start_ms * (audio_metadata.sample_rate / 1000))
        end_frame = int(stop_ms * (audio_metadata.sample_rate / 1000))
        num_frames = end_frame - start_frame

        dur_ms = stop_ms - start_ms
        speech_array, sample_rate = torchaudio.load(
            filepath=path, frame_offset=start_frame, num_frames=num_frames
        )
        # Check that frames exceeds number of characters, wav file is not all zeros, and duration between min, max
        if (
            int(audio_metadata.num_frames) >= len(text)
            and speech_array.count_nonzero()
            and float(self.settings["min_duration_s"])
            < dur_ms / 1000
            < float(self.settings["max_duration_s"])
        ):
            # Resample if required
            if sample_rate != HFTModel.SAMPLING_RATE:
                logger.info(
                    f"Resample from {sample_rate} to {HFTModel.SAMPLING_RATE} | "
                    f"{os.path.basename(path).rjust(20)} | "
                    f"{str(start_ms/1000).rjust(15)} : {str(stop_ms/1000).ljust(15)} | "
                    f"{str(start_frame).rjust(15)} : {str(end_frame).ljust(15)}"
                )
                resampler = torchaudio.transforms.Resample(sample_rate, HFTModel.SAMPLING_RATE)
                speech_array = resampler(speech_array)
            # Use a unique key for the speech key in case there are multiple annotations for audio files
            # i.e. don't use the audio file path as the key
            unique_key = f"{path}{start_ms}{stop_ms}"
            speech[unique_key] = speech_array.squeeze().numpy()
            **temp_start.append(start_ms)**
            **temp_end.append(end_ms)**
            # For debugging/ checking dataset, generate an audio file for listening
            # torchaudio.save(self.tmp_audio_path.joinpath(os.path.basename(path)), speech_array, HFTModel.SAMPLING_RATE)
        else:
            rejected_count += 1
            logger.info(f"rejected {os.path.basename(path)} {start_ms} {stop_ms}")

    # Remove rejected speech by filtering on speech matching length the required conditions
    self.hft_dataset = self.hft_dataset.filter(
        **lambda x: x["start_ms"] in temp_start and x["end_ms"] in temp_end**
    )
benfoley commented 1 year ago

Thanks, we'll take a look.

benfoley commented 1 year ago

temp_end.append(end_ms) is failing because end_ms is not defined. Should that be stop_ms?

ronny3 commented 1 year ago

temp_end.append(end_ms) is failing because end_ms is not defined. Should that be stop_ms?

Yes of course, my bad. I should have been more careful.

benfoley commented 1 year ago

One more thing, would you be able to provide data that you've been using? Perhaps a link to Dropbox or Google drive folder? It would be handy to verify the problem.

benfoley commented 1 year ago

Keeping this here for the moment.. I dummied up a long audio file and tried the current master branch. This is the error during training preparation.

2022-10-17 06:37:50.020 | INFO     | elpis.engines.hft.objects.model:train:721 - len of dataset: 3
Downloading: 4.55kB [00:00, 1.46MB/s]
Traceback (most recent call last):
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1867, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/elpis/elpis/endpoints/model.py", line 122, in train
    return _model_response(setup=setup, build_data=build_data)
  File "/elpis/elpis/endpoints/model.py", line 246, in _model_response
    setup(model)
  File "/elpis/elpis/endpoints/model.py", line 117, in setup
    model.train(on_complete=lambda: logger.info("Trained model!"))
  File "/elpis/elpis/engines/hft/objects/model.py", line 736, in train
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 1085, in train
    train_dataloader = self.get_train_dataloader()
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 626, in get_train_dataloader
    train_sampler = self._get_train_sampler()
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 549, in _get_train_sampler
    return LengthGroupedSampler(
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 520, in __init__
    not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1517, in __getitem__
    return self._getitem(
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1509, in _getitem
    pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 368, in query_table
    _check_valid_index_key(key, size)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 311, in _check_valid_index_key
    raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
ronny3 commented 1 year ago

One more thing, would you be able to provide data that you've been using? Perhaps a link to Dropbox or Google drive folder? It would be handy to verify the problem.

Sure. The zip contains 4 .wav and 4 .eaf files. From the elan files choose the "normalized sentence" tier. It should produce the error with any file if you choose to cut out any utterances eg. using min 1sec and max 10sec. I am obviously using the GUI if that was unclear.

https://drive.google.com/file/d/1EUJBRMJ_mLp-r83dESK4ju59fZWNtvoh/view?usp=sharing

benfoley commented 1 year ago

Again, for the record. This error from ronny3's data.

  File "/elpis/elpis/engines/hft/objects/model.py", line 713, in train
    self.preprocess_dataset()
  File "/elpis/elpis/engines/hft/objects/model.py", line 512, in preprocess_dataset
    self.hft_dataset = self.hft_dataset.map(
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/dataset_dict.py", line 471, in map
    {
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/dataset_dict.py", line 472, in <dictcomp>
    k: dataset.map(
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
    return self._map_single(
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
    out = func(self, *args, **kwargs)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1997, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/elpis/elpis/engines/hft/objects/model.py", line 505, in speech_file_to_array_fn
    batch["speech"] = speech[unique_key]
KeyError: '/state/of_origin/datasets/40842849d817dd2060b8ce61f35947d2/resampled/SKN01b_Suomussalmi.wav25635532564219'
benfoley commented 1 year ago

Thanks for the report and the quick response ronny3. I verified the error with your data, and that your code fixed the bug. We've copied your code into PR #329 and accepted it. 🥳