How to set-up a Vosk multi-threads server architecture in NodeJs

solyarisoftware commented 3 years ago

Hi Nicolay,

That's not a real issue, just two questions/ a brainstorming/suggestion request, about a server architecture in nodejs.

I'm trying to extend my project voskJs implementing a nodejs server side architecture to manage multiple concurrent Vosk transcript requests.

Here https://github.com/alphacep/vosk-api/issues/498 you told me that the transcript function run on a single core and you rightly suggested to implement a multithread server. So I'm trying to understand how can I use nodejs worker threads.

For a server that by example has to manage a single language (consequently say a single model), my idea was

to init the model once at start-up time (in the main/parent server thread) and afterward
to run child threads for run-time transcripts (Recognizer), spawning a thread for each request.

But I have a problem: in nodejs working threads in theory can NOT share an object containing functions. See:

whereas the Vosk Model Object contains functions:

Model {
  handle: Buffer(0) [Uint8Array] [
    type: {
      size: 0,
      indirection: 1,
      get: [Function: get],
      set: [Function: set],
      name: 'void'
    }
  ]
}

So I fair I can't pass to the Model each thread. I'll verify asap in practice.

Now I have a serious problem because the Vosk Model requires a huge amount of RAM.

By example using English language large model vosk-model-en-us-aspire-0.2, it seems to me that Vosk occupy something like ~3 GB RAM (see below the Maximum resident set size (kbytes): 3253024 line when running /usr/bin/time --verbose node voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-en-us-aspire-0.2).

See stdout when running Vosk transcript in single process/request (using voskJs wrapper):

$ /usr/bin/time --verbose node voskjs  --audio=audio/2830-3980-0043.wav --model=models/vosk-model-en-us-aspire-0.2

log level          : 0

LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00862885 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from models/vosk-model-en-us-aspire-0.2/rescore/G.carpa

init elapsed       : 22131ms
Model {
  handle: Buffer(0) [Uint8Array] [
    type: {
      size: 0,
      indirection: 1,
      get: [Function: get],
      set: [Function: set],
      name: 'void'
    }
  ]
}
transcript elapsed : 773ms

{
  result: [
    { conf: 0.980891, end: 1.02, start: 0.33, word: 'experience' },
    { conf: 1, end: 1.349903, start: 1.02, word: 'proves' },
    { conf: 0.996779, end: 1.71, start: 1.35, word: 'this' }
  ],
  text: 'experience proves this'
}

    Command being timed: "node voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-en-us-aspire-0.2"
    User time (seconds): 2.47
    System time (seconds): 5.08
    Percent of CPU this job got: 31%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.23
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3253024
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 9
    Minor (reclaiming a frame) page faults: 808075
    Voluntary context switches: 6226
    Involuntary context switches: 897
    Swaps: 0
    File system inputs: 3407640
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Questions:

May you confirm that Vosk model RAM usage is ~3 GB RAM (for the mentioned language model)?
Using processes instead of threads: If I can't user worker threads, reusing a shared memory for the huge Model object, the alternative could be to implement a multi-process architecture of workers, but in this case any worker process must load the model separately (e.g. > ~3 GB). So I have an 8 cores host, and I foresee say 7 child/worker processes, the total amount of RAM in the host must me something > ~3GB * 7 = >~21 GB! That's insane. Any suggestion for an alternative solution (in nodejs)?
Using vosk-server I guess at the end of the day a nodejs server could just do some IPC with the Vosk-Server you implemented. How much RAM and cpu cores vosk-server requires?

Thanks for your patience Giorgio

nshmyrev commented 3 years ago

I believe you can dump model address as a number, pass it to the worker and then reinitialize model object there from existing memory address. You'll need to add another model constructor to the Model class then. Not sure about details in javascript though.

solyarisoftware commented 3 years ago

I believe it can't work.

The problem is that nodejs worker threads allow workerData data sharing just by-value (see links reference in my initial post). Functions (passing by-reference) are not allowed. Anyway I'll deepen and try in practice.

nshmyrev commented 3 years ago

Functions (passing by-reference) are not allowed.

You must be passing a value (long int address of the model in memory), not reference.

solyarisoftware commented 3 years ago

I’m afraid that’s not possible in NodeJs; you can't convert a Function (reference address) to a Buffer (the long int address you are refering to) :(

nshmyrev commented 3 years ago

it is easy to get address from the model:

  console.log(model.handle.address())

not straightforward to init model back from that address.

nshmyrev commented 3 years ago

we might add a dummy C method that returns model object from int:

VoskModel *vosk_model_new_address(int address)
{
    return ((VoskModel *) address)->Ref();
}

nshmyrev commented 3 years ago

An alternative to workers would be libuv async calls:

https://github.com/node-ffi/node-ffi/wiki/Node-FFI-Tutorial#async-library-calls

solyarisoftware commented 3 years ago

model object from int I see your point, nevertheless I'm perplexed because, as far as i know, each nodejs worker thread has a completely separated/isolated address space, so i'm afraid that the passed address (as int or Buffer) will not point anymore to original model. But maybe you are right, if the new address / model is supplied by a new Vosk API method.
libuv async calls I have to deepen this option. My ignorance is huge :)

nshmyrev commented 3 years ago

I have pushed version 0.3.25 with async demo:

https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/demo_async.js

solyarisoftware commented 3 years ago

Thanks for sharing:

https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/demo_async.js

In your demo, you are running multiple (4) async runs of the Recognizer tasks.

My notes:

In my simple module VoskJs I already did it, "encapsulating" the Recognizer task in the async function transcript(). To run a pseudo-parallel stress test, see: https://stackoverflow.com/a/67279279/1786393
I'll do a stress test with some timing and CPU usage measures I'll report to you.
In the nodejs single thread environment, "spawning" async functions "in parallel" (you used async.filter module) is probably not a solution we need (to manage multiple incoming user transcript requests), because the async functions behind are maybe SYNCHRONOUS functions (to be verified). I mean that when a function call inside a new vosk.Recognizer this imply the run of the external to nodejs (c++, right?) CPU-bound transcript task.

My question to you: is the Recognizer a SYNCRONOUS function (from the nodejs perspective)?
- If the answer is YES (as I believe), please note that in nodejs, even if you "embed" a syncronous code into an async function, the (single) thread is blocked by the called function (pseudo-async, but in fact a cpu-bound).
  See also my comment here: https://stackoverflow.com/a/47806283/1786393
- if the answer is NO, so if the recognizer run in an external (c++) thread, that would be fantastic, because the async function really embed an external (c++) thread.

nshmyrev commented 3 years ago

My question to you: is the Recognizer a SYNCRONOUS function (from the nodejs perspective)?

No, see acceptWaveformAsync, it runs in a thread.

solyarisoftware commented 3 years ago

Ah! I had not noticed. This changes/solves the point ! Let me do some tests :)

solyarisoftware commented 3 years ago

Hi Nicolay

Doing some tests, It first glance seems to me that acceptWaveformAsync works great to set up a multithread server architecture in nodejs!

I made, as part of my VoskJs project (I'll publish soon a new release) a brainless stress test program, that "spawn" N transcript requests in "parallel". That's a "worst case" / theoretic test, just to stress my 8 cores laptop and see what happens.

Results seems to me encouraging! If I run a single request I got 439 msec of elapsed (that's good!) And if i run 10 requests in parallel I got more than 2000 msecs for each transcript. With 4 cores at 100%.

A part the increased elapsed (that could also depend of my laptop HW CPU power saving issue), I believe that's the overall behavior of Vosk multithreading works as expected!

The test is of course not a real case of a server. Soon I'll setup (and publish in voskJs next release) a simple HTTP server architecture to make some more realistic stress tests.

More tests to be done (on a server host / virtual machine).

I'd rename the title of the issue

Thanks for now!

stressTest.js

 const os = require('os')
const { initModel, transcript, freeModel } = require('../voskjs')

const DEBUG_REQUESTS = false
const DEBUG_RESULTS = false
let activeRequests = 0

/** 
 * concurrentRequestsTest
 * run in parallel a number of transcript requests (for a given model and audio file)
 *
 * @async
 * @param {Number} numRequests
 * @param {String} audioFile
 * @param {VoskModelObject} model
 * @return {Promise}
 *
 */ 
function concurrentTranscriptRequests(numRequests, audioFile, model) {

  const promises = []

  for (let i = 0; i < numRequests; i++ ) {

    if (DEBUG_REQUESTS) {
      // new thread started, increment global counter of active thread running
      activeRequests++
      console.log ( `DEBUG. active requests : ${activeRequests}` )
    }  

    // speech recognition from an audio file
    try {
      // run an async function (returning a Promise), without waiting the end of transcript elaboration  
      const result = transcript(audioFile, model)

      // add Promise to an array
      promises.push(result)

    }  
    catch (error) {
      console.error(error) 
    }  

  }  

  // return an array of promises
  return promises
}

/**
 * stressTest
 * unit test
 */ 
async function main() {

  const numRequests = + process.argv[2]

  if ( !numRequests || numRequests < 1 ) {
    console.error(`usage: ${process.argv[1]} number_parallel_requests`)
    process.exit()
  }  

  // take the number of virtual cores (vCPU) 
  const cpuCount = os.cpus().length

  console.log()
  console.log(`CPU cores in this host  : ${cpuCount}`) 

  if ( numRequests > cpuCount ) 
    console.log(`warning: number of requested tasks (${numRequests}) is higher than number of available cores (${cpuCount})`)

  console.log(`requests to be spawned  : ${numRequests}`)
  console.log()

  const modelDirectory = '../models/vosk-model-en-us-aspire-0.2'
  const audioFile = '../audio/2830-3980-0043.wav'

  console.log(`model directory         : ${modelDirectory}`)
  console.log(`speech file name        : ${audioFile}`)
  console.log()

  // create a runtime model
  const model = await initModel(modelDirectory)

  // run numRequests transcript requests in parallel
  const promises = concurrentTranscriptRequests(numRequests, audioFile, model)
  // await singleTranscriptRequests(numRequests, audioFile, model)

  // wait termination of all promises
  for (let i = 0; i < promises.length; i++ ) {

    const result = await promises[i]

    if (DEBUG_REQUESTS) {
      // thread finished, decrement global counter of active thread running
      activeRequests--
      console.log ( `DEBUG. active requests : ${activeRequests}` )
    }  

    if (DEBUG_RESULTS)
      console.log ( result )

  }

  // free the runtime model
  freeModel(model)

  //console.log('done.')

}  

main()

RESULTS

The host:

inxi -C -M
Machine:   Type: Laptop System: HP product: HP Laptop 17-by1xxx v: Type1ProductConfigId serial: <superuser/root required> 
           Mobo: HP model: 8531 v: 17.16 serial: <superuser/root required> UEFI: Insyde v: F.32 date: 12/14/2018 
CPU:       Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB 
           Speed: 600 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 600 2: 600 3: 600 4: 600 5: 600 6: 600 7: 600 8: 600

Single request (1 thread)

$ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 1
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx)    28/04/2021  _x86_64_    (8 CPU)

CPU cores in this host  : 8
requests to be spawned  : 1

model directory         : ../models/vosk-model-en-us-aspire-0.2
speech file name        : ../audio/2830-3980-0043.wav

log level          : 0

LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00668192 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst

08:54:25      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
08:54:26     1000    253795   51,00   82,00    0,00    0,00  133,00     2  node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:54:27     1000    253795   79,00   21,00    0,00    0,00  100,00     2  node

init model elapsed : 2195ms
transcript elapsed : 439ms

Average:     1000    253795   65,00   51,50    0,00    0,00  116,50     -  node
2.95

10 requests in parallel

$ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 10
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx)    28/04/2021  _x86_64_    (8 CPU)

CPU cores in this host  : 8
warning: number of requested tasks (10) is higher than number of available cores (8)
requests to be spawned  : 10

model directory         : ../models/vosk-model-en-us-aspire-0.2
speech file name        : ../audio/2830-3980-0043.wav

log level          : 0

LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00680518 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst

08:45:20      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
08:45:21     1000    252999   56,44   75,25    0,00    0,00  131,68     0  node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:45:22     1000    252999   79,00   21,00    0,00    0,00  100,00     0  node

init model elapsed : 2218ms
08:45:23     1000    252999  233,00   32,00    0,00    0,00  265,00     0  node
08:45:24     1000    252999  383,00    3,00    0,00    0,00  386,00     3  node
transcript elapsed : 1623ms

transcript elapsed : 1734ms

transcript elapsed : 1837ms

transcript elapsed : 1934ms

transcript elapsed : 2040ms

transcript elapsed : 2128ms

transcript elapsed : 2227ms

transcript elapsed : 2369ms

transcript elapsed : 2471ms

08:45:25     1000    252999   99,00    2,00    0,00    0,00  101,00     0  node
transcript elapsed : 2566ms

Average:     1000    252999  169,86   26,75    0,00    0,00  196,61     -  node
5.15

solyarisoftware commented 3 years ago

lutangar commented 2 years ago

I stumble upon a similar issue while working with a nodejs Worker and I suppose it's coming from ffi-napi again.

Both these minimum cases are failing in their own way:

const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true });

require('ffi-napi');
const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true });

See https://github.com/node-ffi-napi/node-ffi-napi/issues/125 . I added some feedbacks on which version it occurs.

Haven't tried with @nshmyrev custom node-ffi-napi yet. But from what I can see the npm vosk package isn't using the node-ffi-napi fork. Is that right?

nshmyrev commented 2 years ago

Haven't tried with @nshmyrev custom node-ffi-napi yet. But from what I can see the npm vosk package isn't using the node-ffi-napi fork. Is that right?

Yeah, not yet. I need to find the time, publish the fork and update the dependency. Hopefully this year ;)

lutangar commented 2 years ago

Alright, I'll see what I can do to help today... I'm creating a Github CI pipeline for https://github.com/alphacep/ref-napi see: https://github.com/larriereguichet/alphacep-ref-napi/actions/runs/1529926857

I'll propose a PR soon.

lutangar commented 2 years ago

Alright, I'll see what I can do to help today...

See https://github.com/alphacep/ref-napi/pull/1

lutangar commented 2 years ago

I added the Worker test cases in https://github.com/alphacep/node-ffi-napi/pull/1 and it seems to run smoothly with the ref-napi fork ...at least in my usecase.

One odd thing tho, I was still getting the segmentation fault (core dumped) node -e error when running:

node -e 'const {Worker}=require("worker_threads"); new Worker(`require(".")`, {eval:true})'

Whereas in the test I couldn't tell if it was failing or not. It doesn't throw for sure but maybe I'm just failing to detect this kind of error.

Anyway while waiting for the PRs to get review you might to give a try to this fork larriereguichet/vosk which is just a fork including @nshmyrev forked version of the *-napi libs.

nshmyrev commented 2 years ago

Thank you Johan. I'll probably try to look over weekend then, it is great things are working!

nshmyrev commented 2 years ago

Alternative is bun:FFI https://twitter.com/jarredsumner/status/1521527222514774017

alphacep / vosk-api

How to set-up a Vosk multi-threads server architecture in NodeJs #502