Closed ronny3 closed 1 year ago
Hi, Can you please submit a PR so we can see your temporary fix in context?
Hi, Can you please submit a PR so we can see your temporary fix in context?
Hi. I'm not really a dev, so I don't know how to PR a container...
But here is the relevant code
def prepare_speech(self):
logger.info("==== Preparing Speech ====")
speech = {}
**temp_start = []**
**temp_end = []**
audio_paths = set()
rejected_count = 0
for utt in self.hft_dataset["train"]:
audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))
for utt in self.hft_dataset["dev"]:
audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))
for utt in self.hft_dataset["test"]:
audio_paths.add((utt["path"], utt["text"], utt["start_ms"], utt["stop_ms"]))
for path, text, start_ms, stop_ms in audio_paths:
audio_metadata = torchaudio.info(path)
start_frame = int(start_ms * (audio_metadata.sample_rate / 1000))
end_frame = int(stop_ms * (audio_metadata.sample_rate / 1000))
num_frames = end_frame - start_frame
dur_ms = stop_ms - start_ms
speech_array, sample_rate = torchaudio.load(
filepath=path, frame_offset=start_frame, num_frames=num_frames
)
# Check that frames exceeds number of characters, wav file is not all zeros, and duration between min, max
if (
int(audio_metadata.num_frames) >= len(text)
and speech_array.count_nonzero()
and float(self.settings["min_duration_s"])
< dur_ms / 1000
< float(self.settings["max_duration_s"])
):
# Resample if required
if sample_rate != HFTModel.SAMPLING_RATE:
logger.info(
f"Resample from {sample_rate} to {HFTModel.SAMPLING_RATE} | "
f"{os.path.basename(path).rjust(20)} | "
f"{str(start_ms/1000).rjust(15)} : {str(stop_ms/1000).ljust(15)} | "
f"{str(start_frame).rjust(15)} : {str(end_frame).ljust(15)}"
)
resampler = torchaudio.transforms.Resample(sample_rate, HFTModel.SAMPLING_RATE)
speech_array = resampler(speech_array)
# Use a unique key for the speech key in case there are multiple annotations for audio files
# i.e. don't use the audio file path as the key
unique_key = f"{path}{start_ms}{stop_ms}"
speech[unique_key] = speech_array.squeeze().numpy()
**temp_start.append(start_ms)**
**temp_end.append(end_ms)**
# For debugging/ checking dataset, generate an audio file for listening
# torchaudio.save(self.tmp_audio_path.joinpath(os.path.basename(path)), speech_array, HFTModel.SAMPLING_RATE)
else:
rejected_count += 1
logger.info(f"rejected {os.path.basename(path)} {start_ms} {stop_ms}")
# Remove rejected speech by filtering on speech matching length the required conditions
self.hft_dataset = self.hft_dataset.filter(
**lambda x: x["start_ms"] in temp_start and x["end_ms"] in temp_end**
)
Thanks, we'll take a look.
temp_end.append(end_ms) is failing because end_ms
is not defined. Should that be stop_ms
?
temp_end.append(end_ms) is failing because
end_ms
is not defined. Should that bestop_ms
?
Yes of course, my bad. I should have been more careful.
One more thing, would you be able to provide data that you've been using? Perhaps a link to Dropbox or Google drive folder? It would be handy to verify the problem.
Keeping this here for the moment.. I dummied up a long audio file and tried the current master branch. This is the error during training preparation.
2022-10-17 06:37:50.020 | INFO | elpis.engines.hft.objects.model:train:721 - len of dataset: 3
Downloading: 4.55kB [00:00, 1.46MB/s]
Traceback (most recent call last):
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
response = self.handle_exception(e)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1867, in handle_exception
reraise(exc_type, exc_value, tb)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/elpis/elpis/endpoints/model.py", line 122, in train
return _model_response(setup=setup, build_data=build_data)
File "/elpis/elpis/endpoints/model.py", line 246, in _model_response
setup(model)
File "/elpis/elpis/endpoints/model.py", line 117, in setup
model.train(on_complete=lambda: logger.info("Trained model!"))
File "/elpis/elpis/engines/hft/objects/model.py", line 736, in train
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 1085, in train
train_dataloader = self.get_train_dataloader()
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 626, in get_train_dataloader
train_sampler = self._get_train_sampler()
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer.py", line 549, in _get_train_sampler
return LengthGroupedSampler(
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 520, in __init__
not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1517, in __getitem__
return self._getitem(
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1509, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 368, in query_table
_check_valid_index_key(key, size)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/formatting/formatting.py", line 311, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
One more thing, would you be able to provide data that you've been using? Perhaps a link to Dropbox or Google drive folder? It would be handy to verify the problem.
Sure. The zip contains 4 .wav and 4 .eaf files. From the elan files choose the "normalized sentence" tier. It should produce the error with any file if you choose to cut out any utterances eg. using min 1sec and max 10sec. I am obviously using the GUI if that was unclear.
https://drive.google.com/file/d/1EUJBRMJ_mLp-r83dESK4ju59fZWNtvoh/view?usp=sharing
Again, for the record. This error from ronny3's data.
File "/elpis/elpis/engines/hft/objects/model.py", line 713, in train
self.preprocess_dataset()
File "/elpis/elpis/engines/hft/objects/model.py", line 512, in preprocess_dataset
self.hft_dataset = self.hft_dataset.map(
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/dataset_dict.py", line 471, in map
{
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/dataset_dict.py", line 472, in <dictcomp>
k: dataset.map(
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1665, in map
return self._map_single(
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 185, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1997, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/opt/pyenv/versions/3.8.2/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1906, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/elpis/elpis/engines/hft/objects/model.py", line 505, in speech_file_to_array_fn
batch["speech"] = speech[unique_key]
KeyError: '/state/of_origin/datasets/40842849d817dd2060b8ce61f35947d2/resampled/SKN01b_Suomussalmi.wav25635532564219'
Thanks for the report and the quick response ronny3. I verified the error with your data, and that your code fixed the bug. We've copied your code into PR #329 and accepted it. 🥳
Lines 597-598 on commit 1304d05 in hft//model.py
This filter does nothing in my container. This will cause a key error as later the program (:505) is trying to acquire from the speech dict something that was not added to it, as the duration was not between min max as decided by the if-clause on :569.
I fixed this with temporary list by changing the filter to lambda x: x["start_ms"] in temp_start and x["end_ms"] in temp_end