Open Bardo-Konrad opened 9 months ago
Anyone has a workaround to this?
I tried finishing all my text with a period "." but that does not stop the synthesizer from ending. Often there are artifacts along with the input text.
Anyone has a workaround to this?
I tried finishing all my text with a period "." but that does not stop the synthesizer from ending. Often there are artifacts along with the input text.
Probably the only way around it is to generate speech, use speech to text, compare to input get timestamps of gibberish, remove, resave.
Kinda dumb, but what the heck.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
I want to draw attention to this.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Anyone has a workaround to this? I tried finishing all my text with a period "." but that does not stop the synthesizer from ending. Often there are artifacts along with the input text.
Probably the only way around it is to generate speech, use speech to text, compare to input get timestamps of gibberish, remove, resave.
Kinda dumb, but what the heck.
I am thinking of implementing this.. However, instead of gathering timestamps for gibberish (we don't know this variable) which is complex to execute, I would prefer to gather timestamps for the input text (we know this variable) and crop + save only this timestamp
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
Sometimes the speech pauses then the speaker continues but it's neither written nor is it any language, but it's clearly the same speaker. Unless you want to create a horror movie with a disturbingly familiar voice, this behaviour is undesired. I think bark has the same issue.
To Reproduce
Expected behavior
Only speak what's being written.