espeak-ng / espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
GNU General Public License v3.0
4.07k stars 877 forks source link

Browser extension #972

Open guest271314 opened 3 years ago

guest271314 commented 3 years ago

Is there any interest in an extension which provides a means to run espeak-ng --stdout from and get raw output in the browser?

jaacoppi commented 3 years ago

What would the use case be? I think having a screen reader is enough.

guest271314 commented 3 years ago

The use case is to use espeak-ng in the browser. Specifically, the capability to

guest271314 commented 3 years ago

I have already written the code. I am just asking if there is interest in the ability to achieve the list items at previous post in the browser.

guest271314 commented 3 years ago

@jaacoppi Use case references:

valdisvi commented 3 years ago

@guest271314, you may provide pull request for review for changes in eSpeak NG. Though, from use cases you mentioned, I suspect that should be part of different project implementing Web Speech API as an adapter for eSpeak NG, but not part of eSpeak NG itself. Possible start may be reviewing e.g. eSpeak.js project. Rationale is, that eSpeak NG already provides API (which may be extended and/or improved, if necessary), but adding more APIs for different programming languages, frameworks and platforms shouldn't be part of core project to avoid feature creep and unneeded complexity.

guest271314 commented 3 years ago

@guest271314, you may provide pull request for review for changes in eSpeak NG. Though, from use cases you mentioned, I suspect that should be part of different project implementing Web Speech API as an adapter for eSpeak NG, but not part of eSpeak NG itself. Possible start may be reviewing e.g. eSpeak.js project. Rationale is, that eSpeak NG already provides API (which may be extended and/or improved, if necessary), but adding more APIs for different programming languages, frameworks and platforms shouldn't be part of core project to avoid feature creep and unneeded complexity.

I have been filing specification and implementation issues re speech synthesis for several years now, too many to list here, brief summary https://github.com/guest271314/captureSystemAudio#references, most closed, none have been officially fixed.

The approach I employ re the PR that I will file within the next few days is to not change eSpeak NG at all. I use Native Messaging to start a local server, PHP passthru() to pass the espeakng command with --stdout option set with fetch(), parse streamed WAV file, write to WritableStream side of a MediaStreamTrackGenerator which provides a MediaStreamTrack representation of the live stream. I am currently making minor adjustments to the pattern described at https://github.com/guest271314/NativeTransferableStreams/blob/web_accessible_resources/Explainer.md. When HTTP/3 over WebTransport is implemented (https://github.com/aiortc/aioquic/issues/163) I will also include a version using that API, e.g., https://github.com/guest271314/webtransport/blob/main/webTransportEspeakNg.js.

guest271314 commented 3 years ago

Prior art https://github.com/guest271314/native-messaging-espeak-ng which downloads and builds eSpeak NG from this repository https://github.com/guest271314/native-messaging-espeak-ng/blob/ae6bbd087733d805e6baba4a35cfb695f03042f3/host/install_host.sh#L31. Chrome Apps are now deprecated.

valdisvi commented 3 years ago

There may be dozens of projects, which use eSpeak or eSpeak NG. Probably, projects you mentioned are much better to start implementing functionality you need. But, I still doubt, that this functionality is needed in eSpeak NG project itself.

guest271314 commented 3 years ago

Again, the browser extension does not change eSpeak NG itself, the extension simply provides a means to use eSpeak NG in the browser - specifically a means to get the raw output as bytes, and as a MediaStreamTrack which can be used with WebRTC. The Emscripten port in this library does not parse SSML https://github.com/espeak-ng/espeak-ng/issues/736, and still uses the deprecated script processor https://github.com/espeak-ng/espeak-ng/blob/master/emscripten/js/demo.js#L26. I will also create a version for Firefox to use AudioWorklet instead of MediaStreamTrackGenerator, which is currently only supported at Chromium/Chrome.

I already implemented the functionality.

AFAIK no other projects implement the functionality described, perhaps save for meSpeak.js https://www.masswerk.at/mespeak/ which uses speak.js and does implement SSML parsing.

valdisvi commented 3 years ago

eSpeak NG project itself is not only *.c files, which makes executables and libraries. It also contains data, scripts, tests, documentation and other files. If you add new functionality to the project, question is, is it needed and who will maintain it. But, anyway, you can create pull request. Otherwise this conversation is too theoretical.

guest271314 commented 3 years ago

@valdisvi

If you add new functionality to the project, question is, is it needed

From my perspective, yes. See use cases at https://github.com/espeak-ng/espeak-ng/issues/972#issuecomment-877820223.

and who will maintain it.

I will.

I filed the PR for this.