Open solyarisoftware opened 3 years ago
I believe you can dump model address as a number, pass it to the worker and then reinitialize model object there from existing memory address. You'll need to add another model constructor to the Model class then. Not sure about details in javascript though.
I believe it can't work.
The problem is that nodejs worker threads allow workerData
data sharing just by-value (see links reference in my initial post). Functions (passing by-reference) are not allowed. Anyway I'll deepen and try in practice.
Functions (passing by-reference) are not allowed.
You must be passing a value (long int address of the model in memory), not reference.
I’m afraid that’s not possible in NodeJs; you can't convert a Function
(reference address) to a Buffer
(the long int address you are refering to) :(
it is easy to get address from the model:
console.log(model.handle.address())
not straightforward to init model back from that address.
we might add a dummy C method that returns model object from int:
VoskModel *vosk_model_new_address(int address)
{
return ((VoskModel *) address)->Ref();
}
An alternative to workers would be libuv async calls:
https://github.com/node-ffi/node-ffi/wiki/Node-FFI-Tutorial#async-library-calls
model object from int I see your point, nevertheless I'm perplexed because, as far as i know, each nodejs worker thread has a completely separated/isolated address space, so i'm afraid that the passed address (as int or Buffer) will not point anymore to original model. But maybe you are right, if the new address / model is supplied by a new Vosk API method.
libuv async calls I have to deepen this option. My ignorance is huge :)
I have pushed version 0.3.25 with async demo:
https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/demo_async.js
Thanks for sharing:
https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/demo_async.js
In your demo, you are running multiple (4) async runs of the Recognizer
tasks.
My notes:
In my simple module VoskJs I already did it, "encapsulating" the Recognizer task in the async function transcript()
. To run a pseudo-parallel stress test, see: https://stackoverflow.com/a/67279279/1786393
I'll do a stress test with some timing and CPU usage measures I'll report to you.
In the nodejs single thread environment, "spawning" async functions "in parallel" (you used async.filter
module) is probably not a solution we need (to manage multiple incoming user transcript requests), because the async functions behind are maybe SYNCHRONOUS functions (to be verified). I mean that when a function call inside a new vosk.Recognizer
this imply the run of the external to nodejs (c++, right?) CPU-bound transcript task.
My question to you: is the Recognizer a SYNCRONOUS function (from the nodejs perspective)?
My question to you: is the Recognizer a SYNCRONOUS function (from the nodejs perspective)?
No, see acceptWaveformAsync, it runs in a thread.
Ah! I had not noticed. This changes/solves the point ! Let me do some tests :)
Hi Nicolay
Doing some tests, It first glance seems to me that acceptWaveformAsync
works great to set up a multithread server architecture in nodejs!
I made, as part of my VoskJs project (I'll publish soon a new release) a brainless stress test program, that "spawn" N transcript requests in "parallel". That's a "worst case" / theoretic test, just to stress my 8 cores laptop and see what happens.
Results seems to me encouraging! If I run a single request I got 439 msec of elapsed (that's good!) And if i run 10 requests in parallel I got more than 2000 msecs for each transcript. With 4 cores at 100%.
A part the increased elapsed (that could also depend of my laptop HW CPU power saving issue), I believe that's the overall behavior of Vosk multithreading works as expected!
The test is of course not a real case of a server. Soon I'll setup (and publish in voskJs next release) a simple HTTP server architecture to make some more realistic stress tests.
More tests to be done (on a server host / virtual machine).
I'd rename the title of the issue
Thanks for now!
stressTest.js
const os = require('os')
const { initModel, transcript, freeModel } = require('../voskjs')
const DEBUG_REQUESTS = false
const DEBUG_RESULTS = false
let activeRequests = 0
/**
* concurrentRequestsTest
* run in parallel a number of transcript requests (for a given model and audio file)
*
* @async
* @param {Number} numRequests
* @param {String} audioFile
* @param {VoskModelObject} model
* @return {Promise}
*
*/
function concurrentTranscriptRequests(numRequests, audioFile, model) {
const promises = []
for (let i = 0; i < numRequests; i++ ) {
if (DEBUG_REQUESTS) {
// new thread started, increment global counter of active thread running
activeRequests++
console.log ( `DEBUG. active requests : ${activeRequests}` )
}
// speech recognition from an audio file
try {
// run an async function (returning a Promise), without waiting the end of transcript elaboration
const result = transcript(audioFile, model)
// add Promise to an array
promises.push(result)
}
catch (error) {
console.error(error)
}
}
// return an array of promises
return promises
}
/**
* stressTest
* unit test
*/
async function main() {
const numRequests = + process.argv[2]
if ( !numRequests || numRequests < 1 ) {
console.error(`usage: ${process.argv[1]} number_parallel_requests`)
process.exit()
}
// take the number of virtual cores (vCPU)
const cpuCount = os.cpus().length
console.log()
console.log(`CPU cores in this host : ${cpuCount}`)
if ( numRequests > cpuCount )
console.log(`warning: number of requested tasks (${numRequests}) is higher than number of available cores (${cpuCount})`)
console.log(`requests to be spawned : ${numRequests}`)
console.log()
const modelDirectory = '../models/vosk-model-en-us-aspire-0.2'
const audioFile = '../audio/2830-3980-0043.wav'
console.log(`model directory : ${modelDirectory}`)
console.log(`speech file name : ${audioFile}`)
console.log()
// create a runtime model
const model = await initModel(modelDirectory)
// run numRequests transcript requests in parallel
const promises = concurrentTranscriptRequests(numRequests, audioFile, model)
// await singleTranscriptRequests(numRequests, audioFile, model)
// wait termination of all promises
for (let i = 0; i < promises.length; i++ ) {
const result = await promises[i]
if (DEBUG_REQUESTS) {
// thread finished, decrement global counter of active thread running
activeRequests--
console.log ( `DEBUG. active requests : ${activeRequests}` )
}
if (DEBUG_RESULTS)
console.log ( result )
}
// free the runtime model
freeModel(model)
//console.log('done.')
}
main()
RESULTS
The host:
inxi -C -M
Machine: Type: Laptop System: HP product: HP Laptop 17-by1xxx v: Type1ProductConfigId serial: <superuser/root required>
Mobo: HP model: 8531 v: 17.16 serial: <superuser/root required> UEFI: Insyde v: F.32 date: 12/14/2018
CPU: Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB
Speed: 600 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 600 2: 600 3: 600 4: 600 5: 600 6: 600 7: 600 8: 600
Single request (1 thread)
$ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 1
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 28/04/2021 _x86_64_ (8 CPU)
CPU cores in this host : 8
requests to be spawned : 1
model directory : ../models/vosk-model-en-us-aspire-0.2
speech file name : ../audio/2830-3980-0043.wav
log level : 0
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00668192 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst
08:54:25 UID PID %usr %system %guest %wait %CPU CPU Command
08:54:26 1000 253795 51,00 82,00 0,00 0,00 133,00 2 node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:54:27 1000 253795 79,00 21,00 0,00 0,00 100,00 2 node
init model elapsed : 2195ms
transcript elapsed : 439ms
Average: 1000 253795 65,00 51,50 0,00 0,00 116,50 - node
2.95
10 requests in parallel
$ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 10
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 28/04/2021 _x86_64_ (8 CPU)
CPU cores in this host : 8
warning: number of requested tasks (10) is higher than number of available cores (8)
requests to be spawned : 10
model directory : ../models/vosk-model-en-us-aspire-0.2
speech file name : ../audio/2830-3980-0043.wav
log level : 0
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00680518 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst
08:45:20 UID PID %usr %system %guest %wait %CPU CPU Command
08:45:21 1000 252999 56,44 75,25 0,00 0,00 131,68 0 node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:45:22 1000 252999 79,00 21,00 0,00 0,00 100,00 0 node
init model elapsed : 2218ms
08:45:23 1000 252999 233,00 32,00 0,00 0,00 265,00 0 node
08:45:24 1000 252999 383,00 3,00 0,00 0,00 386,00 3 node
transcript elapsed : 1623ms
transcript elapsed : 1734ms
transcript elapsed : 1837ms
transcript elapsed : 1934ms
transcript elapsed : 2040ms
transcript elapsed : 2128ms
transcript elapsed : 2227ms
transcript elapsed : 2369ms
transcript elapsed : 2471ms
08:45:25 1000 252999 99,00 2,00 0,00 0,00 101,00 0 node
transcript elapsed : 2566ms
Average: 1000 252999 169,86 26,75 0,00 0,00 196,61 - node
5.15
Related to https://github.com/alphacep/vosk-api/issues/516 maybe also depending on https://github.com/node-ffi-napi/ref-napi/issues/54
I stumble upon a similar issue while working with a nodejs Worker
and I suppose it's coming from ffi-napi
again.
Both these minimum cases are failing in their own way:
const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true });
require('ffi-napi');
const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true });
See https://github.com/node-ffi-napi/node-ffi-napi/issues/125 . I added some feedbacks on which version it occurs.
Haven't tried with @nshmyrev custom node-ffi-napi
yet. But from what I can see the npm vosk
package isn't using the node-ffi-napi fork. Is that right?
Haven't tried with @nshmyrev custom node-ffi-napi yet. But from what I can see the npm vosk package isn't using the node-ffi-napi fork. Is that right?
Yeah, not yet. I need to find the time, publish the fork and update the dependency. Hopefully this year ;)
Alright, I'll see what I can do to help today... I'm creating a Github CI pipeline for https://github.com/alphacep/ref-napi see: https://github.com/larriereguichet/alphacep-ref-napi/actions/runs/1529926857
I'll propose a PR soon.
Alright, I'll see what I can do to help today...
I added the Worker
test cases in https://github.com/alphacep/node-ffi-napi/pull/1 and it seems to run smoothly with the ref-napi fork ...at least in my usecase.
One odd thing tho, I was still getting the segmentation fault (core dumped) node -e
error when running:
node -e 'const {Worker}=require("worker_threads"); new Worker(`require(".")`, {eval:true})'
Whereas in the test I couldn't tell if it was failing or not. It doesn't throw for sure but maybe I'm just failing to detect this kind of error.
Anyway while waiting for the PRs to get review you might to give a try to this fork larriereguichet/vosk
which is just a fork including @nshmyrev forked version of the *-napi
libs.
Thank you Johan. I'll probably try to look over weekend then, it is great things are working!
Alternative is bun:FFI https://twitter.com/jarredsumner/status/1521527222514774017
Hi Nicolay,
That's not a real issue, just two questions/ a brainstorming/suggestion request, about a server architecture in nodejs.
I'm trying to extend my project voskJs implementing a nodejs server side architecture to manage multiple concurrent Vosk transcript requests.
Here https://github.com/alphacep/vosk-api/issues/498 you told me that the transcript function run on a single core and you rightly suggested to implement a multithread server. So I'm trying to understand how can I use nodejs worker threads.
For a server that by example has to manage a single language (consequently say a single model), my idea was
But I have a problem: in nodejs working threads in theory can NOT share an object containing functions. See:
whereas the Vosk Model Object contains functions:
So I fair I can't pass to the Model each thread. I'll verify asap in practice.
Now I have a serious problem because the Vosk Model requires a huge amount of RAM.
By example using English language large model
vosk-model-en-us-aspire-0.2
, it seems to me that Vosk occupy something like ~3 GB RAM (see below theMaximum resident set size (kbytes): 3253024
line when running/usr/bin/time --verbose node voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-en-us-aspire-0.2
).See stdout when running Vosk transcript in single process/request (using voskJs wrapper):
Questions:
May you confirm that Vosk model RAM usage is ~3 GB RAM (for the mentioned language model)?
Using processes instead of threads: If I can't user worker threads, reusing a shared memory for the huge Model object, the alternative could be to implement a multi-process architecture of workers, but in this case any worker process must load the model separately (e.g. > ~3 GB). So I have an 8 cores host, and I foresee say 7 child/worker processes, the total amount of RAM in the host must me something > ~3GB * 7 = >~21 GB! That's insane. Any suggestion for an alternative solution (in nodejs)?
Using vosk-server I guess at the end of the day a nodejs server could just do some IPC with the Vosk-Server you implemented. How much RAM and cpu cores vosk-server requires?
Thanks for your patience Giorgio