Open GamingDaveUk opened 9 months ago
That is a byproduct of the sentence splitter. It will just drop things here and there.
The painful alternative is to do it sentence by sentence. An automated alternative would be to split the text beforehand (like by actual sentence) but, when it is phonemized it might miss parts of the audio at the end.
Example:
The old sentence splitter splits up until it hits the phoneme limit.
if text_split_length is not None and len(text) >= text_split_length:
text_splits.append("")
nlp = get_spacy_lang(lang)
nlp.add_pipe("sentencizer")
doc = nlp(text)
for sentence in doc.sents:
if len(text_splits[-1]) + len(str(sentence)) <= text_split_length:
# if the last sentence + the current sentence is less than the text_split_length
# then add the current sentence to the last sentence
text_splits[-1] += " " + str(sentence)
text_splits[-1] = text_splits[-1].lstrip()
elif len(str(sentence)) > text_split_length:
# if the current sentence is greater than the text_split_length
for line in textwrap.wrap(
str(sentence),
width=text_split_length,
drop_whitespace=True,
break_on_hyphens=False,
tabsize=1,
):
text_splits.append(str(line))
else:
text_splits.append(str(sentence))
if len(text_splits) > 1:
if text_splits[0] == "":
del text_splits[0]
else:
text_splits = [text.lstrip()]
return text_splits
I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")
if text_split_length is not None and len(text) >= text_split_length:
#text_splits.append("")
nlp = get_spacy_lang(lang)
nlp.add_pipe("sentencizer")
doc = nlp(text)
for sentence in doc.sents:
sentence = str(sentence).replace(". ", ". <>")
frags = sentence.split("<>")
text_splits += frags
#text_splits.append(str(sentence))
print(sentence)
else:
text_splits = [text.lstrip()]
return text_splits
So it might introduce some abnormalities or behave unusually.
That is a byproduct of the sentence splitter. It will just drop things here and there.
The painful alternative is to do it sentence by sentence. An automated alternative would be to split the text beforehand (like by actual sentence) but, when it is phonemized it might miss parts of the audio at the end.
Example:
The old sentence splitter splits up until it hits the phoneme limit.
if text_split_length is not None and len(text) >= text_split_length: text_splits.append("") nlp = get_spacy_lang(lang) nlp.add_pipe("sentencizer") doc = nlp(text) for sentence in doc.sents: if len(text_splits[-1]) + len(str(sentence)) <= text_split_length: # if the last sentence + the current sentence is less than the text_split_length # then add the current sentence to the last sentence text_splits[-1] += " " + str(sentence) text_splits[-1] = text_splits[-1].lstrip() elif len(str(sentence)) > text_split_length: # if the current sentence is greater than the text_split_length for line in textwrap.wrap( str(sentence), width=text_split_length, drop_whitespace=True, break_on_hyphens=False, tabsize=1, ): text_splits.append(str(line)) else: text_splits.append(str(sentence)) if len(text_splits) > 1: if text_splits[0] == "": del text_splits[0] else: text_splits = [text.lstrip()] return text_splits
I edited mine to fit my need, and it seems to work out, but the text has to be in a very particular format (all lines end in ". ")
if text_split_length is not None and len(text) >= text_split_length: #text_splits.append("") nlp = get_spacy_lang(lang) nlp.add_pipe("sentencizer") doc = nlp(text) for sentence in doc.sents: sentence = str(sentence).replace(". ", ". <>") frags = sentence.split("<>") text_splits += frags #text_splits.append(str(sentence)) print(sentence) else: text_splits = [text.lstrip()] return text_splits
So it might introduce some abnormalities or behave unusually.
interesting. I may give that a go. I dont think the developer is too bothered with this issue or is not able to replicate so been looking for a reliable alternative... not having any luck, so if this fixes it I will be very happy.
This is happening when you use a finetuned model with some bad traing data. With the base 2.0.2 everthing works as expected. After manual curating all wav files and the whisper transcript, my finetuned models did not have that issue any more. Give it a try :)
i'm having the same issue with the collab version. not only is it losing entire blocks of text, but also mixing up and repeating text all while also hallucinating and giving demon voices or the voice morphing into another voice/gender.
We are losing whole sentences or the end of sentences when we generate the audio. At first I thought it to be a training issue, but if you regenerate then you can get the sentence it missed last time back fine only for it to lose another one.
Here is an example of the text we are putting into it:
In the first gen it missed:
"Sammy, your destiny lies with the power of the Squawkstone." She was bewildered but excited by this revelation.
in the second gen it missed:
King Cluckington declared Sammy an honorary citizen of Fowlmore and bestowed upon her the title of "Chicken Whisperer."