Open arbdevml opened 2 years ago
Hi Alexandro!
The constructor for SpkModel should already be available, but the setter method of the KaldiRecognizer Recognizer::SetSpkModel
still needs to be exposed in src/bindings.cc. After that, the speaker x-vector should be available together with the result.
The constructor Recognizer::Recognizer(Model *model, float sample_frequency, SpkModel *spk_model)
should also be exposed for completeness.
Go ahead if you want to give it a try. I will otherwise make some time next week for it.
Preparing Builder Environment: apt update && apt -y upgrade apt install -y build-essential git sudo screen curl
curl -sSL https://get.docker.com | sh sudo usermod -aG docker $(whoami) docker run hello-world
cd $HOME git clone --recursive https://github.com/ccoreilly/vosk-browser cd vosk-browser screen time make builder time make binary
updated files: src/vosk.d.ts
export declare class Model {
constructor(path: string);
public delete(): void;
}
export declare class SpkModel {
constructor(path: string);
public delete(): void;
}
export declare class KaldiRecognizer {
constructor(model: Model, sampleRate: number);
constructor(model: Model, sampleRate: number, grammar: string);
constructor(model: Model, sampleRate: number, spkModel: SpkModel);
public SetSpkModel(spkModel: SpkModel): void;
public SetWords(words: boolean): void;
public AcceptWaveform(address: number, length: number): boolean;
public Result(): string;
public PartialResult(): string;
public FinalResult(): string;
public delete(): void;
}
export declare interface Vosk {
FS: {
mkdir: (dirName: string) => void;
mount: (fs: any, opts: any, path: string) => void;
};
MEMFS: Record<string, any>;
IDBFS: Record<string, any>;
WORKERFS: Record<string, any>;
HEAPF32: any;
downloadAndExtract: (url: string, localPath: string) => void;
syncFilesystem: (fromPersistent: boolean) => void;
Model;
KaldiRecognizer;
SetLogLevel(level: number): void;
GetLogLevel(): number;
_malloc: (size: number) => number;
_free: (buffer: number) => void;
}
export default function LoadVosk(): Promise<Vosk>;
src/bindings.cc
// Copyright 2020 Denis Treskunov
// Copyright 2021 Ciaran O'Reilly
#include <emscripten/bind.h>
#include "utils.h"
#include "../vosk/src/kaldi_recognizer.h"
#include "../vosk/src/model.h"
#include "../vosk/src/spk_model.h"
using namespace emscripten;
namespace emscripten {
namespace internal {
template<> void raw_destructor<Model>(Model* ptr) { /* do nothing */ }
template<> void raw_destructor<SpkModel>(SpkModel* ptr) { /* do nothing */ }
}
}
struct ArchiveHelperWrapper : public wrapper<ArchiveHelper> {
EMSCRIPTEN_WRAPPER(ArchiveHelperWrapper);
void onsuccess() {
return call<void>("onsuccess");
}
void onerror(const std::string &what) {
return call<void>("onerror", what);
}
};
static Model *makeModel(const std::string &model_path) {
try {
return new Model(model_path.c_str());
} catch (std::exception &e) {
KALDI_ERR << "Exception in Model ctor: " << e.what();
throw;
}
}
static SpkModel *makeSpkModel(const std::string &model_path) {
try {
return new SpkModel(model_path.c_str());
} catch (std::exception &e) {
KALDI_ERR << "Exception in SpkModel ctor: " << e.what();
throw;
}
}
static KaldiRecognizer* makeRecognizerWithGrammar(Model *model, float sample_frequency, const std::string &grammar) {
try {
KALDI_VLOG(2) << "Creating model with grammar";
return new KaldiRecognizer(model, sample_frequency, grammar.c_str());
} catch (std::exception &e) {
KALDI_ERR << "Exception in KaldiRecognizer ctor: " << e.what();
throw;
}
}
static KaldiRecognizer* makeRecognizerWithSpk(Model *model, float sample_frequency, SpkModel *spk_model) {
try {
KALDI_VLOG(2) << "Creating model with spk";
return new KaldiRecognizer(model, sample_frequency, spk_model);
} catch (std::exception &e) {
KALDI_ERR << "Exception in KaldiRecognizer ctor: " << e.what();
throw;
}
}
static void KaldiRecognizer_SetSpkModel(KaldiRecognizer &self, SpkModel *spk_model)
{
KALDI_VLOG(2) << "Setting SpkModel";
self.SetSpkModel(spk_model);
}
static void KaldiRecognizer_SetWords(KaldiRecognizer &self, int words) {
KALDI_VLOG(2) << "Setting words to " << words;
self.SetWords(words);
}
static bool KaldiRecognizer_AcceptWaveform(KaldiRecognizer &self, long jsHeapAddr, int len) {
const float *fdata = (const float*) jsHeapAddr;
KALDI_VLOG(3) << "AcceptWaveform received len=" << len << " 0=" << fdata[0] << " " << len-1 << "=" << fdata[len-1];
return self.KaldiRecognizer::AcceptWaveform(fdata, len);
}
static string KaldiRecognizer_Result(KaldiRecognizer &self) {
std::string s;
s += self.KaldiRecognizer::Result();
return s;
}
static string KaldiRecognizer_FinalResult(KaldiRecognizer &self) {
std::string s;
s += self.KaldiRecognizer::FinalResult();
return s;
}
static string KaldiRecognizer_PartialResult(KaldiRecognizer &self) {
std::string s;
s += self.KaldiRecognizer::PartialResult();
return s;
}
EMSCRIPTEN_BINDINGS(vosk) {
class_<ArchiveHelper>("ArchiveHelper")
.function("Extract", &ArchiveHelper::Extract)
.allow_subclass<ArchiveHelperWrapper>("ArchiveHelperWrapper")
.function("onsuccess", optional_override([](ArchiveHelper& self) {
return self.ArchiveHelper::onsuccess();
}))
.function("onerror", optional_override([](ArchiveHelper& self, const std::string &what) {
return self.ArchiveHelper::onerror(what);
}))
;
class_<Model>("Model")
.constructor(&makeModel, allow_raw_pointers())
;
class_<SpkModel>("SpkModel")
.constructor(&makeSpkModel, allow_raw_pointers())
;
class_<KaldiRecognizer>("KaldiRecognizer")
.constructor(&makeRecognizerWithGrammar, allow_raw_pointers())
.constructor<Model *, float>(allow_raw_pointers())
.constructor(&makeRecognizerWithSpk, allow_raw_pointers())
.constructor<SpkModel *, float>(allow_raw_pointers())
.function("SetWords", &KaldiRecognizer_SetWords)
.function("SetSpkModel", &KaldiRecognizer_SetSpkModel)
.function("AcceptWaveform", &KaldiRecognizer_AcceptWaveform)
.function("Result", &KaldiRecognizer_Result)
.function("FinalResult", &KaldiRecognizer_FinalResult)
.function("PartialResult", &KaldiRecognizer_PartialResult)
;
emscripten::function("SetLogLevel", &SetVerboseLevel);
emscripten::function("GetLogLevel", &GetVerboseLevel);
}
faced with these errors:
Very big thank you Ciaran O'Reilly for your answer.
Hi @arbdevml, sorry for my late reply. I'll check your changes. In the future, it'd be easier if you forked the repository and shared your changes in a branch of your fork. That way, it is pretty straightforward to check it out and test.
Hello. First of all very big thank you for this project.
I am trying to create an example with a speaker model to get the X-vector of the speaker (voice fingerprint).
I am using this example: https://github.com/ccoreilly/vosk-browser/blob/master/examples/words-vanilla/index.js
Speaker identification model: https://alphacephei.com/vosk/models/vosk-model-spk-0.4.zip
Node.js example: https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/test_speaker.js
Could you offer some advice, please: 1) How to load vosk-model-spk-0.4.zip 2) How to implement methods createSpeakerModel and setSpkModel 3) How to fetch the X-vector of the speaker (voice fingerprint)? Thank you for your answer.