MiniXC / alignments

Automatically creates/downloads alignments for multiple speech datasets, using pre-existing alignments were possible.
5 stars 0 forks source link

Crashed jupyter server #3

Open KORALLLL opened 6 months ago

KORALLLL commented 6 months ago

Method _create_item() (maybe _load_files()) in class Alignment dataset crashes jupyter server. The problem occurs only with the train.other-500 subset (the others work correctly) of the LibriTTS-R dataset. When trying to create a dataset with such a split, the server crashes. The dataset itself is downloaded correctly, but immediately after downloading the stage "collecting textgrid and audio files" does not complete, and crashes either the kernel or the server. I tried reducing chunk size and max workers, but it didn't help.

KORALLLL commented 6 months ago

I don't have access to the server logs yet, my sysadmin said the problem is "lack of resources". During this initialisation temp = LibrittsRDataset( target_directory="LIBRI_TTS/train-other-500-alignments", source_directory="LIBRI_TTS/train-other-500-data", source_url="http://www.openslr.org/resources/141/train_other_500.tar.gz", verbose=True, tmp_directory="LIBRI_TTS/train-other-500-tmp", chunk_size=1000 ) at about 70k the server was crashing. When reducing chunk_size to 500 it worked out to get to 100k+. I think I'll try changing the n_workers argument as well