Closed roedoejet closed 1 month ago
Review changes with SemanticDiff.
Analyzed 2 of 4 files.
Overall, the semantic diff is 13% smaller than the GitHub diff.
Filename | Status | |
---|---|---|
:heavy_check_mark: | everyvoice/cli.py | 13.44% smaller |
:heavy_check_mark: | everyvoice/model/aligner/wav2vec2aligner | Analyzed |
:grey_question: | docs/guides/custom.md | Unsupported file format |
:grey_question: | docs/guides/finetune.md | Unsupported file format |
All modified and coverable lines are covered by tests :white_check_mark:
Please upload report for BASE (
main@3b20c2e
). Learn more about missing BASE report.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
CLI load time: 0:00.30
Pull Request HEAD: 230b92f2b765e93a2fffec2a488ed241e297a20f
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
I am running some test right now with it :-)
Things that I noticed, the process if multithreaded and will use all available CPUs . Carefull when running on a cluster "head node" ( using up all the shared resources.)
Below I ran in a 40CPU container...
[U20-GPSC/etc/slurm-llnl/slurm]:$ top
top - 11:52:46 up 38 days, 23:52, 0 users, load average: 37.38, 22.87, 17.36
Tasks: 15 total, 2 running, 13 sleeping, 0 stopped, 0 zombie
%Cpu(s): 97.8 us, 2.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 192032.2 total, 95793.2 free, 28507.4 used, 67731.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 155469.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
395 tes001 20 0 10.6g 3.9g 112852 R 3991 2.1 124:45.39 everyvoice
I also ran into an issue when trying to run the alignment on a test dataset that I have.
I received this message below after it ran for about 8 minutes and died. It a pretty big test file that I am using ( ~ 23 minutes of audio / Inuktitut) , I will try the same with something shorter and see if I get the same.
============ Starting job 5002533 on Wed 09 Oct 2024 11:48:37 AM EDT on node ib12be-094.science.gc.ca OS "Ubuntu 20.04.6 LTS"
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_extract/ever │
│ yvoice/model/aligner/wav2vec2aligner/aligner/cli.py:126 in align_single │
│ │
│ 123 │ print("performing alignment") │
│ 124 │ from .heavy import align_speech_file │
│ 125 │ │
│ ❱ 126 │ characters, words, sentences, num_frames = align_speech_file( │
│ 127 │ │ wav, text_hash, model, labels, word_padding, sentence_padding │
│ 128 │ ) │
│ 129 │ print("creating textgrid") │
│ │
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_extract/ever │
│ yvoice/model/aligner/wav2vec2aligner/aligner/heavy.py:32 in │
│ align_speech_file │
│ │
│ 29 │ audio, text_hash, model, labels_dictionary, word_padding, sentence │
│ 30 ): │
│ 31 │ emission = get_emission(model, audio.to(DEVICE)) │
│ ❱ 32 │ segments, words, sentences = compute_alignments( │
│ 33 │ │ text_hash, │
│ 34 │ │ labels_dictionary, │
│ 35 │ │ emission, │
│ │
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_extract/ever │
│ yvoice/model/aligner/wav2vec2aligner/aligner/heavy.py:144 in │
│ compute_alignments │
│ │
│ 141 │ │ end = None │
│ 142 │ │ for w_k, w_v in word_hash.items(): │
│ 143 │ │ │ if sentence == re.match(key_pattern, w_k).group(1): │
│ ❱ 144 │ │ │ │ scores.append(w_v.score) │
│ 145 │ │ │ │ if start is None: │
│ 146 │ │ │ │ │ start = w_v.start │
│ 147 │ │ │ │ end = w_v.end │
╰──────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'dict' object has no attribute 'score'
============ Finished job 5002533 on Wed 09 Oct 2024 11:54:52 AM EDT with rc=1
Ok, I think I found this minor bug. The command everyvoice segment extract
should I think be creating the "OUTDIR" folder if the folder does not exist. In the example below My first try failed with the message " Invalid value for 'OUTDIR': Directory 'OUTPUT' does not exist. " After I created the folder "OUTPUT" it ran with success. ( Very cool! ) Nice work Aidan , I will run more tests using other data sets and combinations and verify more closely the results!
(EveryVoice_extract) [U20-GPSC5]:$ everyvoice segment extract 1.Welcome-16000-16000-mono.TextGrid 1.Welcome-16000-16000-mono.mp3 OUTPUT
Usage: everyvoice segment extract [OPTIONS] TEXT_GRID_PATH AUDIO_PATH OUTDIR
Try 'everyvoice segment extract -h' for help.
╭─ Error ───────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Invalid value for 'OUTDIR': Directory 'OUTPUT' does not exist. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(EveryVoice_extract) [U20-GPSC5]:$ mkdir OUTPUT
(EveryVoice_extract) [U20-GPSC5]:$ everyvoice segment extract 1.Welcome-16000-16000-mono.TextGrid 1.Welcome-16000-16000-mono.mp3 OUTPUT
Writing audio to files: 100%|█████████████████████████████████████████████████| 6/6 [00:00<00:00, 974.93it/s]
Success! Your audio is available in /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/Extract_Alignment/Welcome/OUTPUT/wavs and your corresponding metadata file is available in /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/Extract_Alignment/Welcome/OUTPUT/metadata.psv
(EveryVoice_extract) [U20-GPSC5]:$ pwd
/home/tes001/u/TxT2SPEECH/Extract_Alignment/Welcome/OUTPUT
(EveryVoice_extract) [U20-GPSC5]:$ find .
.
./wavs
./wavs/segment0.wav
./wavs/segment4.wav
./wavs/segment2.wav
./wavs/segment5.wav
./wavs/segment3.wav
./wavs/segment1.wav
./metadata.psv
(EveryVoice_extract) [U20-GPSC5]:$ cat metadata.psv
basename|text
segment0|ᑐᙵᓱᒋᑦ.
segment1|ᑐᙵᓱᑉᐳᖓ.
segment2|ᐃᓄᒃᑎᑑᓲᖑᕕᑦ?
segment3|ᐄ, ᒥᑭᔪᒥᒃ.
segment4|ᕇᑕᐅᔪᖓ. ᑭᓇᐅᕕᑦ?
segment5|ᑕᐃᕕᑎᐅᔪᖓ.
I received this message below after it ran for about 8 minutes and died. It a pretty big test file that I am using ( ~ 23 minutes of audio / Inuktitut) , I will try the same with something shorter and see if I get the same.
Hm, yes, we need to spend a bit more time making the alignment more efficient and more robust. I think this is the same as the error @joanise described here: https://github.com/EveryVoiceTTS/EveryVoice/issues/327
Ok, I think I found this minor bug. The command
everyvoice segment extract
should I think be creating the "OUTDIR" folder if the folder does not exist. In the example below My first try failed with the message " Invalid value for 'OUTDIR': Directory 'OUTPUT' does not exist. " After I created the folder "OUTPUT" it ran with success. ( Very cool! ) Nice work Aidan , I will run more tests using other data sets and combinations and verify more closely the results!
Nice catch - thanks! I fixed this.
I made sure to remove all numbers from my test. Now I am trying again and removed things like " ? ( ) ! - " and trying again to see if I get the same failure. I am managing to get it to work on small test chunks of that same file. I am trying to hunt down exactly what / where is causing the issue. ( what block / chunk of my test file...) I keep you posted if I can pinpoint and reproduce consistently.
PR Goal?
In writing the documentation for
everyvoice segment
I realized that we didn't actually have an easy way of extracting the text/audio intervals from the textgrid into the format needed by Everyvoice. This PR adds that feature and also updates the documentation.Fixes?
https://github.com/EveryVoiceTTS/EveryVoice/issues/543 https://github.com/EveryVoiceTTS/EveryVoice/issues/544
Feedback sought?
Sanity. Suggest any changes to the CLI method names or documentation.
Priority?
medium
Tests added?
How to test?
For this to work you need a plain text transcript and some corresponding audio. You can then run the segmenter: everyvoice segment align path_to_text.txt path_to_audio.wav. You can then install Praat and use it to inspect the .TextGrid file that was generated, and adjust any alignments as necessary. Once you are happy with your alignments, you can use everyvoice segment extract path_to_alignment.TextGrid path_to_audio.wav outdir which will then create a folder called outdir with your audio, and a metadata file containing references to each of your audio files and the corresponding text.
Confidence?
medium
Version change?
new alpha release
Related PRs?
https://github.com/EveryVoiceTTS/wav2vec2aligner/pull/12