daisy / word-save-as-daisy

"Save as DAISY" add-in for Microsoft Word
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

add Lao and Khmer TTS and speech synthesis functions. #34

Open EfaJapan opened 1 year ago

EfaJapan commented 1 year ago

In the non-profit organization I belong to, we are progressing with the development of "accessible digital educational materials" for children with disabilities in Cambodia and Laos.

To create multimedia books, we are utilizing "Save as Daisy" (although it has not officially been announced to be compatible with MS365 Word, it has been confirmed to work) to convert them into "Daisy 3.0". However, we are facing difficulties in generating synthetic voices for Lao (Laotian) and Khmer (Cambodian).

Languages such as Japanese, English, and Portuguese have been verified and generated MP3(speechgen0002.mp3) files in these languages has been confirmed.

One possible reason is that the language packs for Lao and Khmer added to the "Region & Language" settings in Windows 11 Home (64-bit) do not include "text-to-speech" capabilities.

In "Immersive Reader," reading aloud in Lao and Khmer is possible, but it only works in online Ms365 and MsEdge. Furthermore, I understand that in the process of using "Save as Daisy" in Word to convert to Multimedia Daisy files, TTS and voice synthesis functionality used is not from Immersive Reader but is directly added to Windows.

Since reading aloud in these languages is already possible in Immersive Reader, we’d like to know if it's possible to integrate this functionality as a Windows language pack, or if there are third-party, paid solutions to add these languages.

Based on the above, please advice on how to incorporate Lao and Khmer TTS and how to get it.The ability to add "voice synthesis" for these languages is a critical issue in our project and could significantly improve accessibility in developing countries if resolved.

I have also contacted MS365 Ambassadors, but was not able to receive a solution, so I'd greatly appreciate if you could lend me your wisdom in this community.

NPavie commented 1 year ago

Thank you for your feedback.

If i understand your problematic, Microsoft does not offer a the time Text-to-speech voices for Lao and Khmer languages within Microsoft Desktop voice packs, but the Immersive Reader Azure service does.

As a matter of fact, it is in the consortium plan to extend the capabilities of the addin to allow the use the Azure Text-to-sppech services (issue #32 ) through the DAISY Pipeline 2 and its connector to text to speech engines (including Azure services).

We don't have yet a clear visiblity on when the feature will be made available in the addin, but i can contact you if you want to test the feature when it is implemented.

EfaJapan commented 1 year ago

Thank you for your prompt response. May I ask you a few more questions for clarification?It would be helpful if you could teach me as much as you can.

(1) Is there any way to add text-to-speech functionality for Lao and Khmer languages to Microsoft Desktop (Windows) by adding language packs?

(2) Is my understanding correct that Azure TTS is the same as the Text-To-Speech used in the Immersive Readers?

(3) "extend the capabilities of the addin" Does this mean that Save as Daisy will be able to utilize Azure's Text-to-Speech engine as mentioned in (2), which is the same as Immersive Reader's TTS? Will these synthesized voices be applied when converting to Daisy 3.0 format? In other words, will it be possible to create Daisy 3.0 files in languages supported for reading in Immersive Reader(ex: Lao, Khmer), without relying on Windows Desktop language packs?

(4) You mentioned that the release date for (3) is still undecided, but even a beta version would be fine. Could you please tell me roughly when it might be available?

Sorry for the detailed question, but please let me know..

Furthermore, I’d be interested in participating in testing this feature when it becomes available.

NPavie commented 1 year ago

(1) Is there any way to add text-to-speech functionality for Lao and Khmer languages to Microsoft Desktop (Windows) by adding language packs?

Sorry but it is beyond my knowledge, i don't know how to create and integrate new language pack. From what i have read in documentation and what you reported, the language pack provided by Microsoft for Lao and Khmer does not include Text to speech capabilities. I think there are some third-party voices provided that can provide voices that are compatible with microsoft desktop text-to-speech engine, but i don't think we have the technical knowledge and resources in the consortium to create thos packs.

(2) Is my understanding correct that Azure TTS is the same as the Text-To-Speech used in the Immersive Readers?

From what i read in microsoft documentatiion, it is the same voices that are used.

(3) "extend the capabilities of the addin" Does this mean that Save as Daisy will be able to utilize Azure's Text-to-Speech engine as mentioned in (2), which is the same as Immersive Reader's TTS? Will these synthesized voices be applied when converting to Daisy 3.0 format? In other words, will it be possible to create Daisy 3.0 files in languages supported for reading in Immersive Reader(ex: Lao, Khmer), without relying on Windows Desktop language packs?

It is indeed the goal.

(4) You mentioned that the release date for (3) is still undecided, but even a beta version would be fine. Could you please tell me roughly when it might be available?

We are still elaborating the development plan, but the development should start in the next week or the week after. I cannot guarantee a delivery date but we hope to provide a new test version that would include this additionnal feature around end of october / early november 2023, at the condition we manage to fix a blocking issue in our connector to Microsoft SAPI / OneCore text-to-speech engine in Windows 11 that we are currently investigating and is delaying the 2.8 release.

EfaJapan commented 1 year ago

Thank you for your reply again. In the development of (3), the process of Save as Daisy, TTS of Azure (= Immersive Reader) can be used to generate multimedia Daisy files. That's great. If this becomes a reality, I believe that almost all of the most important issues we currently face will be solved, and we believe that it will greatly contribute to the spread of accessible books among people in minor language areas. We really hope to develop it as soon as possible. Please let me follow your progress, I'd be happy if could have the opportunity to participate in the test phase.

EfaJapan commented 1 year ago

(1) Is there any way to add text-to-speech functionality for Lao and Khmer languages to Microsoft Desktop (Windows) by adding language packs?

Sorry but it is beyond my knowledge, i don't know how to create and integrate new language pack. From what i have read in documentation and what you reported, the language pack provided by Microsoft for Lao and Khmer does not include Text to speech capabilities. I think there are some third-party voices provided that can provide voices that are compatible with microsoft desktop text-to-speech engine, but i don't think we have the technical knowledge and resources in the consortium to create thos packs.

(2) Is my understanding correct that Azure TTS is the same as the Text-To-Speech used in the Immersive Readers?

From what i read in microsoft documentatiion, it is the same voices that are used.

(3) "extend the capabilities of the addin" Does this mean that Save as Daisy will be able to utilize Azure's Text-to-Speech engine as mentioned in (2), which is the same as Immersive Reader's TTS? Will these synthesized voices be applied when converting to Daisy 3.0 format? In other words, will it be possible to create Daisy 3.0 files in languages supported for reading in Immersive Reader(ex: Lao, Khmer), without relying on Windows Desktop language packs?

It is indeed the goal.

(4) You mentioned that the release date for (3) is still undecided, but even a beta version would be fine. Could you please tell me roughly when it might be available?

We are still elaborating the development plan, but the development should start in the next week or the week after. I cannot guarantee a delivery date but we hope to provide a new test version that would include this additionnal feature around end of october / early november 2023, at the condition we manage to fix a blocking issue in our connector to Microsoft SAPI / OneCore text-to-speech engine in Windows 11 that we are currently investigating and is delaying the 2.8 release.


Thank you very much for your advise the other day. Has there been any progress since then? If a test version is released, we'd like to participate. We look forward to hearing good news from you.