diffusionstudio / vits-web

Web api for using VITS based models in the browser!
https://huggingface.co/spaces/diffusionstudio/vits-web
146 stars 11 forks source link

What's the entry point for bundling? #2

Closed guest271314 closed 2 months ago

guest271314 commented 2 months ago

What's the entry point? So we can bundle to a single script with deno and bun?

How are you running this in the browser?

guest271314 commented 2 months ago

It looks like the a request is being made to https://cdn.jsdelivr.net/npm/@diffusionstudio/piper-wasm@1.0.0/build/piper_phonemize.data twice. The second request is erroring.

git clone https://github.com/diffusion-studio/vits-web
bun build src/index.js --outfile=bundle.js
await download('en_US-hfc_female-medium', (progress) => {
  console.log(`Downloading ${progress.url} - ${Math.round(progress.loaded * 100 / progress.total)}%`);
});

var wav = await predict({
  text: "Text to speech in the browser is amazing!",
  voiceId: 'en_US-hfc_female-medium',
});

console.log(wav);
Blob {size: 5033, type: 'text/plain'}
vits-web.js:37670 Downloading https://huggingface.co/diffusionstudio/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx - NaN%
vits-web.js:37514 

       GET https://cdn-lfs-us-1.huggingface.co/repos/65/0b/650b753432aedcc190080795f6713cadd0aa9463dc40d59aa78e6c28ef7fdf01/914c473788fc1fa8b63ace1cdcdb44588f4ae523d3ab37df1536616835a140b7?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27en_US-hfc_female-medium.onnx%3B+filename%3D%22en_US-hfc_female-medium.onnx%22%3B&Expires=... net::ERR_FAILED 200 (OK)
(anonymous) @ vits-web.js:37514
fetchBlob @ vits-web.js:37489
(anonymous) @ vits-web.js:37615
download @ vits-web.js:37614
(anonymous) @ vits-web.js:37669
vits-web.js:37453 null
vits-web.js:37457 TypeError: Failed to execute 'write' on 'FileSystemWritableFileStream': The provided value is not of type 'WriteParams'.
    at writeBlob (vits-web.js:37454:20)
writeBlob @ vits-web.js:37457
await in writeBlob
(anonymous) @ vits-web.js:37615
await in (anonymous)
download @ vits-web.js:37614
(anonymous) @ vits-web.js:37669
vits-web.js:37604 Blob {size: 635212, type: 'application/wasm'}
vits-web.js:37604 null
vits-web.js:37514 

       GET https://cdn.jsdelivr.net/npm/@diffusionstudio/piper-wasm@1.0.0/build/piper_phonemize.data net::ERR_FAILED 200 (OK)
(anonymous) @ vits-web.js:37514
fetchBlob @ vits-web.js:37489
createBlobUrl @ vits-web.js:37601
await in createBlobUrl
predict @ vits-web.js:37557
await in predict
(anonymous) @ vits-web.js:37673
vits-web.js:37607 Uncaught TypeError: Failed to execute 'createObjectURL' on 'URL': Overload resolution failed.
    at createBlobUrl (vits-web.js:37606:14)
    at async predict (vits-web.js:37557:31)
    at async vits-web.js:37673:11
createBlobUrl @ vits-web.js:37606
XMLHttpRequest.send
(anonymous) @ vits-web.js:37514
fetchBlob @ vits-web.js:37489
createBlobUrl @ vits-web.js:37601
await in createBlobUrl
predict @ vits-web.js:37557
await in predict
(anonymous) @ vits-web.js:37673
k9p5 commented 2 months ago

I haven't tested deno or bun yet. Just chromium + (vite/webpack), if you clone the repo and install all dependencies you can enable e2e tests for other browsers in the playwright config.

guest271314 commented 2 months ago

I thought this was supposed to be working in the browser standalone?

I'm testing on Chromium Version 128.0.6569.0 (Developer Build) (64-bit).

Use git to get this repository. cd to repository, run bun install -p, bundle to a single script with bun build src/index.js --outfile=bundle.js, run bundle.js in the browser in DevTools => Sources => Snippets with export {...} commented out, errors are thrown

/*
export {
  voices,
  stored,
  remove,
  predict,
  flush,
  download,
  WASM_BASE,
  PATH_MAP,
  ONNX_BASE,
  HF_BASE
};
*/
await download('en_US-hfc_female-medium', (progress) => {
  console.log(`Downloading ${progress.url} - ${Math.round(progress.loaded * 100 / progress.total)}%`);
});

var wav = await predict({
  text: "Text to speech in the browser is amazing!",
  voiceId: 'en_US-hfc_female-medium',
});

console.log(wav);

errors are thrown

vits-web.js:37513 
 GET https://cdn-lfs-us-1.huggingface.co/repos/65/0b/650b753...&Key-Pair-Id=... net::ERR_FAILED 200 (OK)
vits-web.js:37456 TypeError: Failed to execute 'write' on 'FileSystemWritableFileStream': The provided value is not of type 'WriteParams'.
    at writeBlob (vits-web.js:37453:20)
vits-web.js:37513 
 GET https://cdn.jsdelivr.net/npm/@diffusionstudio/piper-wasm@1.0.0/build/piper_phonemize.data net::ERR_FAILED 200 (OK)
vits-web.js:37605 Uncaught TypeError: Failed to execute 'createObjectURL' on 'URL': Overload resolution failed.
    at createBlobUrl (vits-web.js:37604:14)
    at async predict (vits-web.js:37556:31)
    at async vits-web.js:37670:11
k9p5 commented 2 months ago

Not sure what you're trying to do, the library is already bundled. you just need to install it on your frontend, although I have now removed Xmlhttprequest and URL so we could get this to node as well.

guest271314 commented 2 months ago

There's no installation needed. node is not needed, either. I bundled to code into a single script using deno, bun, and base64 to serialize onyx-webruntime and piper.js, and run the code standalone in the browser.

guest271314 commented 2 months ago

so we could get this to node as well.

You can't get the code as-is to node world because you are using FileSystemFileHandle that node does not support. Your current code doesn't write onyx-runtimeweb or piper.js to the Origin Private File System. I bundled all assets to a single script. I only included one (1) voice in the demonstration, because it looks like onyx-runtimeweb is around 60 MB alone, and the requesting multiple voice files takes too long for an example.

guest271314 commented 2 months ago

@k9p5 Included an MVCE https://github.com/diffusion-studio/vits-web/commit/d840dadd9290f388b8a9fc3201b0f6e9734dc824 published at GH Pages of my fork https://guest271314.github.io/vits-web/.

I'll probably add encoding to WebM https://github.com/davedoesdev/webm-muxer.js, MP3 https://github.com/guest271314/MP3Recorder, raw Opus packets with WebCodecs https://github.com/guest271314/WebCodecsOpusRecorder, and a live MediaStreamTrack https://github.com/guest271314/native-messaging-espeak-ng.

k9p5 commented 2 months ago

I already wrote an opfs mock for unit testing in node, with some modification I could have it write to a tempdir instead, so that in node it will access the usual file system.

k9p5 commented 2 months ago

@k9p5 Included an MVCE d840dad published at GH Pages of my fork https://guest271314.github.io/vits-web/.

I'll probably add encoding to WebM https://github.com/davedoesdev/webm-muxer.js, MP3 https://github.com/guest271314/MP3Recorder, raw Opus packets with WebCodecs https://github.com/guest271314/WebCodecsOpusRecorder, and a live MediaStreamTrack https://github.com/guest271314/native-messaging-espeak-ng.

Wish I would have seen this earlier, now I put a lot of time into writing my own demo app: https://huggingface.co/spaces/diffusionstudio/vits-web

guest271314 commented 2 months ago

Works, after a wait, with this error

Uncaught (in promise) SecurityError: Storage directory access is denied.

I think using a Web extension for this will get rid of waiting for voices, piper.js, and onyx-runtimeweb.js to download. Download them once, store in extension, communicate with extension from arbitrary Web pages.

If running this using node, deno, bun are important to you you can just get rid of the WHATWG File System parts of the code.

guest271314 commented 2 months ago

I think I figured out how to bundle your code, and you and I both published working examples, so I'll close this. Thanks.