collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
3.8k stars 207 forks source link

gui/user text/file/tts #98

Closed BBC-Esq closed 7 months ago

jpc commented 7 months ago

I love the detailed installation instructions. Maybe it would be worth putting them in a single place so it's easier to keep them consistent and up to date? Something like a INSTALL.md that we could also link to from the README?

Also I believe if we make the first step about manually installing (CUDA) PyTorch then we won't need the additional step to uninstall and reinstall it.

jpc commented 7 months ago

I've also tested the demo and it is really nice, thanks! :)

I uploaded a bigger PDF so it would be nice if I would be able to cancel the reading-in-progress.

But I'll merge this as it is now and we can improve it with a new PR.

BBC-Esq commented 7 months ago

I love the detailed installation instructions. Maybe it would be worth putting them in a single place so it's easier to keep them consistent and up to date? Something like a INSTALL.md that we could also link to from the README?

Like the idea. I can get some basic Windows instructions and others can do Linux/MacOS. As you can see I've tried to Linux/MacOS instructions on my repository but, due to me not having access to those platforms, it's been extremely painstaking and I'd feel more comfortable if someone else were to do those ones.

For purposes of the pull request, would the install.md be in the root folder for the repository?

Also I believe if we make the first step about manually installing (CUDA) PyTorch then we won't need the additional step to uninstall and reinstall it.

I can verify if this will work on widows. My concern was that pip itself would uninstall any cuda/torch version installed, when it installed speechbrain, since speechbrain was installed later in time. I know that "generally" if the same library is installed twice in an installation process, that pip will overwrite the prior with the latter...but I'll verify if this is the case with CUDA specifically in this case...

BBC-Esq commented 7 months ago

I've also tested the demo and it is really nice, thanks! :)

I uploaded a bigger PDF so it would be nice if I would be able to cancel the reading-in-progress.

But I'll merge this as it is now and we can improve it with a new PR.

Yep, there's also a minor threading issue in which the GUI hangs if it's a very large .docx/pdf. I tried a 2,000 page legal treatise...and even though I added a new thread for the ingestion process, it hung. A decent example for folks though, that people can build off of and spur ideas, not production ready. Will slightly improve over time, and keep updated with new pull requests (e.g. the possible new (generate_to_playback method). Thanks!