flathub / net.mkiol.SpeechNote

https://flathub.org/apps/details/net.mkiol.SpeechNote
2 stars 2 forks source link

Audition voices before installation #53

Open gbodley opened 1 month ago

gbodley commented 1 month ago

I just learned of Speech Note today. Installed it today on Kubuntu 24.04, and contributed a small amount to the project today. It's wonderful compared to everything in the past that I tried on Linux. There are just two things that I would like to see improved at the moment. I would like to be able to adjust the text size in the window. Perhaps I could adjust Plasma in some way to display larger fonts. I haven't done that yet. The other issue is there are massive number of voices, but the descriptions are very cryptic. Perhaps a better secondary descriptive name could be added to each voice along with the ability to hear it even on a website for the project if not in the app itself. I really hope you are able to keep going, because this is a high quality project that has been needed for a very long time. If there was a way to integrated it into LibreOffice it would be fantastic. I hope other programmers will help to move it on to higher levels. Also I don't know of a guide for an optimum configuration. I have a pretty fast computer with significant memory, so it works fine on my machine, but I don't know what is actually best to choose. Is there a best options default with say just 5 of 6 good voices for each language? Maybe just 4? Google translate only has 1 female voice and it lacks personality. The British male voice I'm using with Speech note has much more personality.

gbodley commented 1 month ago

Having used the app a bit more and gained some understanding of it, I think voices should not be mixed in with languages. It seems to me that Languages should be one tab, speech systems a separate tab, and voices a third. I feel at least at the outset there are just too many voices to go through. Some of the Individual voices seem better than others. What I mean by this is that they are more human sounding. Some have more personality in how they read, while others sound more robotic in nature. I suggested above that it might be best to default to what is considered the most natural language system with just 5 or 6 of the most natural with the most personality in the presentation. Another tab or sub area could be "addition voices" Male and female could be categories and perhaps even "robotic" or cartoonish could be other categories. Perhaps there are voices with accents. Say an understandable English speaker with a French accent and so on. Then of course celebrity voices like Obama or Marilyn Monroe. The point is now there is just a huge list of nondescript voices that take a significant amount of time to go through. Some don't have much personality or a rather drab personalities.

This is really a wonderful project with tremendous potential that has long been needed for the Linux Desktop. . I tried to attach an .mp3 file of what I consider to be a clever use of a voice. but .mp3 is not an accepted format. That's completely ridiculous for a program that specifically works generates .mp3 files.

gbodley commented 1 month ago

Just removing a voice that you don't want is very time consuming, and the voice is not easy to find. There probably needs to tab for only installed voices . It would be nice to be able to add your own name to a voice for example RH Ryan down at the bottom is an American English voice with a lively personality. "Alan" below him was pretty much a dud that I don't care to use. It took a long time to find Alan to delete him.

gbodley commented 3 weeks ago

I found out is is possible to put larger text into the window, but with larger text is stalls briefly when moving to the next line. There for you have to try to keep your sentences to just one line. The feature I would like most is to have it read just highlighted text. Sometimes it mispronounces words. The pronunciation can be corrected by deliberately misspelling the word since it does try to read phonetically. However you cannot limited to reading just a selected area of text as far as I know. Since working on a pronunciation takes experimentation, it would be very helpful to be able to select just the portion you are working on, and have it read that. Also once you develop a pronunciation it would be nice to be able to add it to a dictionary of words that are pronounced incorrectly when spelled correctly. This is more a problem with the English language than the program itself.

mkiol commented 3 weeks ago

Hi. Thank you for the reporting all your ideas for improvements. I really need and greatly appreciate any feedback from Speech Note users!

The other issue is there are massive number of voices, but the descriptions are very cryptic. Perhaps a better secondary descriptive name could be added to each voice along with the ability to hear it even on a website for the project if not in the app itself. I think voices should not be mixed in with languages. It seems to me that Languages should be one tab, speech systems a separate tab, and voices a third. I feel at least at the outset there are just too many voices to go through.

I completely agree that this is a problem. I plan to improve the model browser in the future releases. I have taken note of all your ideas and will try to implement some of them.

Part of the work has been done with "Filtering" option. I don't know if you already found it, but you can filter models using certain criteria.

image

If there was a way to integrated it into LibreOffice it would be fantastic.

Integration capabilities are very limited right now. You can invoke "actions" using global keyboard shortcuts (X11 only) when Speech Note is hidden or in the background. For example, you can read text that is in the clipboard. To enable global keyboard shortcut use option in Setting->Accessibility.

image

This is really a wonderful project with tremendous potential that has long been needed for the Linux Desktop. . I tried to attach an .mp3 file of what I consider to be a clever use of a voice. but .mp3 is not an accepted format.

Could you elaborate more about it? I don't think I understood what the problem is. MP3 files are supported for both import and export.

The feature I would like most is to have it read just highlighted text.

This is supported right now. Select text and open context menu (right mouse click).

image

Also once you develop a pronunciation it would be nice to be able to add it to a dictionary of words that are pronounced incorrectly when spelled correctly. This is more a problem with the English language than the program itself.

Indeed an interesting idea 👍🏿

BTW, the main place to report problems is https://github.com/mkiol/dsnote/issues. This repository is just for making Flathub package. Just letting know if you want to report any new problem in the future.